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Abstract 

As the nunber of processors indistri but ed-nEnory mil ti processors grows , effici entl y sup- 
porting a shared- nEnory progranmng nodel becones di fftul t . We have designed the 
Protocol for Hierarchical Elrectories (PHD) to al 1 ow shared- nEnory support for systens 
cont ai ni ng nassi ve nunber s of processors . PHD el i ni nates bandw dth probi ens by usi ng a 
seal abl e network, decreases hot- spot s by not rel yi ng on a si ngl e poi nt to di stri but e bl ocks , 
and uses a scalable anount of space for its directories. PHDprovides a shared- nEnory 
no del by synthesi zi ng a gl obal shared nEnory f romthe 1 ocal he nor i es of processors . PHD 
supports sequenti al 1 y consi stent read, write, and test- and- set operations. 

This thesis al so i ntroduces a mt hod of descri bi ng 1 ocal i ty for hierarchical protocols anc 
enpl oys this mthodinthe derivation of an abstract model oi the protocol behavior. An 
enbedded nodel , based on the war k of Johnson [13], describes the protocol behavior vhen 
napped t o a k- ary n- cube. The the sis uses these twa nodel s t o study the average hei ght i n 
the hi erarchy that operations reach, the longest pathnEssages travel, the nunber of hes- 
sages that operations generate, the i nt er- transacti on i ssue ti he, and the protocol overhead 
for different 1 ocal i ty par amters , degrees of mil ti threadi ng, andnachine sizes. 

W deter nine that mil ti threadi ng i s onl y useful for approxi nat el y t wa t o four threads ; any 
additional i nt eri eavi ng does not decrease the overall latency. For snail nachines and hi gh 
1 ocal i ty appl i cati ons , this linitationis due nai nl y t o the 1 ength of the runni ng threads . Foi 
1 arge nachi nes w th raedi umto 1 owl ocal it y, this linitationis due nai nl y t o the protocol 
overhead bei ng t oo 1 arge. 

Our study usi ng the enbedded nodel shows that i n si tuati ons \\here the run 1 ength bet \\een 
references to sharednEnory i s at 1 east an order of nagni tude 1 onger than the tins to process 
a si ngl e state transi ti on i n the protocol , appl i cati ons exhi bi t good per for nance. If separat 
control 1 ers for proces si ng protocol request s are i ncl uded, the protocol seal es to 32k process > 
nachi nes as 1 ong as the appl i cati on exhi bi t s hi erarchi cal 1 ocal i t y: at 1 east 22% of the gl oba] 
references mist be able to be sati sfied 1 ocal 1 y; at most 35%of the global references are 
al 1 o\\ed to reach the t op 1 eve 1 of the hi erarchy. 
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Chapter 1 



Int r oduct i on 



The shared- HEnory nodel has been a conveni ent pr ogr aiming par adi gmf or miltiproces- 
sors . As the nunber of elenents in mil ti processors gro\\B , ho\\ever , effti entl y support! ng a 
shared- HE nor y pr ogr aiming node 1 becomes di fftul t . Bus- based snoopi ng schemes suffte 
for only snail nunbers of processors; they are inadequate for large nunbers of processors 
because thei r bandw dth does not groww th the nunber of processor s [ 8] . El rectory- based 
c ac hi ng schemes , on the other hand, al 1 owshari ng among 1 arge nunbers of processors vhen 
i npl enented on net w)rk- based conputers ; the bandwdthof the network imst increase 
TO th the nunber of processor s . Hierarchical di rectory- based schemes have the pot enti al to 
scale indefinitely because they have neither the space requirements of full-nap directory 
schemes nor the 1 i ni ted nunber of copi es re qui rements of linited-di rectory schemes nor the 
linear dependence on the nunber of copies for i nval i dat i on of chai ned schemes . Herarchi- 
cal schemes addi ti onal 1 y expl oi t the spatial andtenporal 1 ocal i ty of processes running on 
anachine. W have desi gned the Protocol for Herarchical El rectories (PHD) to provide 
shared- he nor y for syst ens conposed of nassi ve nunbers of processors . 

PHDsynthesi zes a gl obal shared he nor y f romthe pri vat e 1 ocal he nor i es of processors . 
Processors access global addresses in the sans nanner as they access local ones. The 
syst emoperat es on blocks (or lines) consisting of several wards of data, capitalizing on 
spatial locality. PHDnaintains sequential consistency [14] in its support of read, write, 
and test- and- set operations. 

Int hi s the sis \realsoi ntroduce a he t hod of descri bi ng 1 ocal i ty f or hi erar chi cal prot ocol s 

15 



16 CHAPTER 1. INTRODUCTI ON 

W deri ve an abstract nvdel of the behavi or of the Protocol for H erarchi cal 11 rect ori es usi ng 
thi s net hod and use i t to study the average hei ght reached i n the hi e rare hy per ope rati on, 
the longest path of he s sages traveled per operation, and the nunber of he s sages generated 
per operation for different nachi ne configurations. W validate this nodel using atrace- 
dri ven si nul at or. 

The abstract nodel is used to generate inputs to an enbedded nodel . The enbedded 
nodel describes howthe protocol behaves \\hen napped to a k- ary n- cube using the our 
proposed nappi ng schene. 

1.1 Di rector y-Ba se d Pr o t o c o 1 s 

Miny of the i deas used i n the Protocol for H erarchi cal 11 rect ori es evol ved f romdi rect or y- 
based protocol research as \rell as earl y hi erarchi cal protocols. M)st of the earl y research 
assuned cert ai n capabi 1 i ti es , such as a broadcast abi 1 i ty, \\hi ch do not scale \rell. Several 
other hierarchical protocols have been proposed si nee the start of this \rork. 

1.1. 1 Fl at Directory-Based Protocol s 

Al 1 di rect ory- based protocol s keep a record associated wth each block of nai n nenory. 
There have been a to de variety of directory schenes proposed and studied. Tkng [26] 
proposed a wri t e- back schenE in vhi ch the nai n nEnory and every cache mist keep a 
directory. In order to find a block, all of the individual directories need to be checked. 
Censi er and Feautri er [7] first proposed the concept of a "di rt y bi t " vhi ch i ndi cat es \\hether 
or not the val ue st ored i n nai n nEnory is the nearest one. They al so added a bi t vector 
to the nai n nEnory di rect ory i ndi cati ng \\hi ch caches have copies of the block. These 
addi ti ons el i ninat ed theneedtosearcheverylocal di rect ory after every data nodi fie at i on. 
Agar\\al [4] di s cussed these schenES and thei r 1 ack of seal abi 1 i ty due either to the need 
to broadcast or to the presence of a bot tl eneck. Ife nsnti oned the i dea of di stri buti ng the 
di rect ory across the nEnori es , i n order to prevent any bot tl enecks . Qiai ken [ 8] sho\\ed that 
di rect ori es are seal abi e and that sons shared- data cachi ng schenE s , for nany appl i eati ons , 
perf ormbet t er than schenES vhi eh cache onl y pri vat e data. The shared- data schenES that 
he looked at i nel ude 1 i ni ted di rect ory, full nap, and si ngl y and doubl y- 1 i nked chai ns . 
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Li niteddi rectory schenES enpl oy a linited nunber of poi nt ers to keep track of vhi ch 
processors have copies of particular blocks; vhen a newprocessor \\ants a copy and the 
1 i ni ted nunber of poi nt ers have al 1 been al 1 ocat ed, the schene nust resort to broadcast or 
t o i nval i dat i on. In a ful 1 - nap di rectory, al 1 processor s can have copi es of any bl ock. Singly 
1 i nked chai ns di stri but e the di rectory entry, threadi ng i t through the processors vhi ch have 
copi es . Daubl y- 1 i nked chai ns use a doubl e 1 i nkage, t o al 1 owthe chai n t o be f ol 1 o\\ed ei ther 
■way. 

1. 1. 2 Hi erarchi cal Schenas 

In the above- raenti oned di rectory scheraes , the honB 1 ocat i on of a parti cul ar bl ock i s st at- 
i cal 1 y fixed: any processor w thout a copy of the bl ock i n i t s cache vhi chwshes to access 
that bl ock mist lookinasi ngl e fixed 1 ocat i on. Herarchical directoryschenss \rere desi gned 
both to reduce thi s st ati c re qui rensnt by provi di ng adapt i ve data ni grati on and to sol ve 
the linited band™ dth probl emof si ngl e bus schenss . 

Ahi erarchi cal schens, i n general , has atree structure. At the 1 o\\est 1 evel of the tree are 
processors w th caches ; at the other 1 evel s are di rector i es recor di ng vhi ch bl ocks are cached 
by nodes 1 ocat ed physi cal 1 y bel owthemi n the tree. Any nunber of copi es of a bl ock are 
al 1 o\\ed toexist at atins. Are ad request i s t ypi cal 1 y propagated up the tree unti 1 a copy 
islocated; awrite t ypi cal 1 y i nvol ves 1 ocati ng and i nvali dati ng al 1 of the extra copi es by tr 
traversal and then perf orning the wri t e. 

In [30] Wl son proposes the first hierarchical mil ti processor architecture. Ife suggests 
nodifications to several bus- based scheraes in order to forma protocol for his proposed 
hi erarchy \\hi chuses shared buses of caches to formthe tree. Ife does not, ho\\ever, consider 
howhi s i deas waul d \rork on very 1 arge seal e syst ens . Archibald, i n [ 5] , proposes another 
sol uti on i nt ended for a snal 1 hi erar chy of buses , renarki ng that hi s protocol is f easi bl e f oi 
a t w)- 1 evel hi erar chy, but not necessarilyathree-level, or four- 1 evel one. 

Iferi di and Hager st en [12] 1 at er proposed a hi erar c hi cal scherae vhi ch \\as desi gned for a 
mich 1 arger syst ens : the Efet a 11 ffiisi on Mchi ne ( EnV|. Thei r ar chi t ecture al so assunes 
atree conposed of buses , vhi chforces all request s t o be routed through the hi erar chy. The 
i nt ernedi at e 1 evel directories st ore i nf ornati on as to\\hether copies of each bl ock cached 
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belowis cached any \\iie re above or \\iiether it is exclusive to that subtree, thus al 1 ow ng 
themto reduce trafft on the higher-level buses during writes. Their architecture also 
el i ni nates the need for a hone 1 ocati on f or a bl ock. Thei r hi erarchi cal schene t ypi fies a 
COM^, or Cache- Oil y Mnory Archi tecture, as defined i n St ens tronis [25] paper. 

In a later paper, ^ng, Thangadurai and Bhuyan [32] proposed a sinilar hierarchical 
bus schenE vhi ch al so keeps track of the excl us i vi ty status for each bl ock. IM i ke Hari di ' s 
schenE, ho\\ever, they as sum that the nai nnEnory i s si tuat ed at the top of the hi erarchy, 
provi di ng a st ati c pi ace f or bl ocks to be st ored i n vhen they are thrown out of the caches . 

Scott and Gbodnan describe a hierarchical schenE for processors connected using a 
k- ary n- cube net wark i n [ 23] and [ 24] . Thei r nappi ng schenE provi des ri ngs , \\hi ch repi ace 
buses as the broadcast nsthod for their protocol. They also introduces the concept of 
pruning caches, vhi ch el i ninat e the need of al 1 of the earl i er prot ocol s for conpl et e mil ti - 
1 evel i ncl usi on, i.e. keepi ng a hi gher 1 evel di rectory entry for every 1 o\\er 1 evel one. Prunin 
caches al 1 owa trade off" bet \reendi rectory size and network band™ dth to be dynani cal 1 y 
nade, and coul d be added to PHD 

Ma, Pradhan, and Thi ebaut [18] [19] are currenti y warki ng on a hi erarchi cal directory 
schenE for non- bus- based archi t ectures , but have not yet f ul 1 y worked out the det ai 1 s of the 
protocol . 

The ^feudal 1 Square Ee search Oonpany has bui 1 1 a system.™ th a ri ng- based hi erarchi cal 
di rectory schenE [6]. In their schenE mil ti - 1 evel inclusion is required. They have not 
rel eased mich i nfornati on about their protocol. 

Parthasarathy [ 21] studi ed an earl i er versi on of PHE), EHP, descri bed i n [ 29] . Although 
his refined versi on of EHP traverses the hi erarchy fe\rer times than does PHE), it to 11 
deadl ock under cert ai n condi ti ons . Parthasarathy' s war k does not consider this problem 
Hs protocol also does not guarantee that a read operati on w 1 1 nake enough progress to 
conpl ete even in the absence of deadl ock si nee write operations can ski p ahead of reads 
i ndefini t el y. 
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1. 1. 3 HD 

The Protocol for Herarchical Ilrectories is a tree-based hierarchical directory protocol 
Any nuiiber of processors can have read- onl y copi es of a bl ock i n thei r caches, lb find 
a bl ock, a processor sends a 1 ocati on raessage \\hi ch travel s up\\ar ds unti 1 a node \\hi ch 
knows of a copy is found. This node sends a nsssage \\hi ch travel s down\\ards until it 
reaches a node w th a copy. The node w th a copy sends theblockdirectlytothe requesti ng 
node, vhi ch then sends a confirnati on he s sage up\\ar ds t o i ndi cat e that i t has fini shed its 
read. In this imnner, reads can be sati sfied i n the 1 o\\est comiDn subtree c ont ai ni ng t he 
requester and a copy of the bl ock. 

Wi t e ope rati ons i nvol ve findi ng al 1 of the copi es of ablockinthe syst emand del eti ng 
them Oil y the nodes in the snallest subtree containing all copies of the block and the 
wri t e requester are i nvol ved in a write. The owner of the bl ock transfers owner s hi p t o 
the node requesting the write. Acknowl edgnent s of del eti on f romal 1 of the nodes which 
previ ousl y had copi es ar e c onbi ne d, and an acknowl edge nsssage i s sent downwar ds t o t he 
node that requested the wri t e. The test- and- set ope rati oni s actual 1 y a test- and- test- and- sei 
ope rati on; i t i s i npl eraent ed as an opti nized conbi nati on of the read and wri t e ope rati ons . 

1 . 2 An alysis and Lo c a 1 i t y Mo del i n g 

Miny previous nodel s [15] [28] [31] of hierarchical cache consi stency protocol s have nod- 
eledbus architectures, and as such, consi dered bus trafft effects to be nDst inportant. 
Leutenegger and "Vfernon, in [15], assuns uni f ormcache inss rates across the nachine. 
^ng [ 31] assuiiES a si ngl e- 1 evel cl us teri ng model for reference rat es , where each siial 1 est 
group of processors i s equal lylikelyto access sons bl ocks and al 1 other processors are equal 1 } 
1 ess 1 i kel y to access those bl ocks . \fernon. Jog, and Sohi [28] do not di recti y consi der data 
locality; instead, they choose fixed ni ss rati os for di fferent 1 evel caches. 

Scott , i n [ 24] , actual 1 y cal cul at es the trafffc for a ri ng- based hi erar chi cal system Ife 
assunES best- case, worst- case, and randomdat a- access patterns i n hi s study. 

Johns on[ 13] studi es locality and its effects on mil ti processor per for nance. Ife deri ves a 
conbi nednodel of appl i cati ons , comiuni cati onnEchani sns , audi nterconnecti onnet works , 
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and uses the result toshowthat "expl oi ti ng comiuni cati on 1 ocal i ty provi des gains \\iii ch 
are at nost 1 i near i n the factor by vhi ch average conmini cati on di stance i s reduced, " as 
long as the out standi ng nunber of conmini cati ons per processor is bounded. W use his 
nodel as the basis for the enbedded nodel studies of Chapter 6. 

Stenstron^ Joe, and Gipta [25] conpare the perfornance of a (X8V^ archi t ecture 
TOth that of a NUV^ (non- uni f ormnenory architecture). They find that the (X8V^ 
archi t ecture perf orns wjrse than the NUV^one for nany si tuati ons , such as those \\here 
coherence ni sses doninat e over capaci t y ni sses . Mny of thei r assunpti ons , ho\\ever, do not 
appl y to the wark describedinthis the sis. They assune a 16 processor configurat i on, \\here 
the effect s of 1 ocal i ty are goi ngtobe less i nport ant than on a nassi vel y paral 1 el nachi ne. 
They also assuns that the (X8V^ archi t ecture wll be running Iferi di and Ifegersten's 
ECMprot ocol . PHD, on the other hand, as w 1 1 be expl ai ned i n Secti on 1. 4, not onl y uses 
a shorter path in order tosatisfyread request s , but al so el i ni nates the repi acensnt probi em 
of the ECMprot ocol . The paper concludes wththeir proposal of (X8V^- F, a ffat (X8V^ 
architecture. (X8V^- F 1 i ke PHD, has a naster (owner ) node. 

1.3 J-Machine 

The cache coherence protocol \\as designed as part of the J- Michi ne [9] project at MT. 
The J- Michi ne i s a nassi vel y paral 1 el , fine- grai ned nsssage- passi ng concurrent conputer. 
Although the J- Michi ne \\as designed to efficiently support a nsssage- passi ng language, 
it provides inexpensive synchroni zati on pri ni ti ves to support other pr ogr aiming node 1 s 
as \rell. The cache coherence protocol \\as devel oped i n the context of considering shared- 
nEnory progr aiming envi ronnsnt s for the J- Michi ne. 

1.4 Contri but ions 

The Protocol for Herarchical 11 rectories differs in several \\ays fromprevi ousl y proposed 
hi erar chi cal schenss . It is desi gned for nsssage- pas si ng mil ti conputer syst ens vhi ch use 
snal 1 cache block sizes. It is both seal abl e and strongl y coherent . 
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1.4.1 Scalability 

PHDi s seal abl e in cost and 1 at ency, as defined by Scot t i n [ 23] . Ife re qui res that cost grow 
si o'wer than 0{^ ■wi th nachi ne size, and 1 at ency grow no faster tha^.f^iV 

Gbst The cost of the hard\\are includes the cost of the network and the cost of the 
di rectory \\hi ch stores tag bits. A k- ary n- cube, as 1 ong as the di nensi onal i ty proper 1 y 
increases to th size, grows at less th^^ f^ST] . The di rectory overhead for PHD i s 
(^ Nog If by Scott ' s defini ti ons . Therefore, PHDi s seal abl e in cost . 

Latency As shown in Qiapter 5, the unl oaded- net wark predicted latency per read or 
write operati on seal es at less th.aai)(^Nhe latency due to protocol overhead for the 
proposed enbeddi ng of PHDi nt o a k- ary n- cube depends on the total nunber of nes sages 
sent and thus the degree of sharing. 

Ebttleneck at the Ttp of the Hierarchy IIil i ke i n a bus- based hi erarchi cal archi t ec- 
ture, request s vhi ch span across the nachi ne are not constrai nedto cross through the sane 
poi nt . PHDdi stri but es the 1 evel s of the hi e rare hy across each node of a nachi ne. There i s 
no bot tl e neck at "the" top di rectory, because there are di fferent t op di rect ori es for di fferent 
bl ocks . Thi s nappi ngis describedinSection2.2. 

1. 4. 2 Messages and Longest Rith Traversed 

Because PHDi s not restri cted by the nachi ne archi tecture to a f ol 1 owthe hi erarchy at al 1 
ti nES , both the 1 ongest path travel ed and the nunber of nss sages generated per read are 
shorter and f e\rer than i n an enforced hi erarchy. 

longest Rit hper Qierat i on As s hown i n R gur el.l, are ad in PHDi s satisfieddirectly 

after t wa traversal s of the hi erarchy and a si ngl e di rect nsssage to del i ver the data. Stri ct 

hi erarchi es requi re four hi erarchy traversal s before a read resul t can be used. 

Messages per Qieration The nunber of nss sages per read operati on i s al so snal 1 er i n 

PHD than i n a standard hi erarchy. As Fi gure 1. 2 illustrates, onl y three traversal s of the 
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Figure 1.1: The left side of the figure shows the path a read request nust fulfill before it 
receives the data for the other prot ocol s . The ri ght si de shows how the path i s shorter for 

PHa 





Fi gure 1.2: The 1 eft si de of the figure shows the path a read request mist fulfil 1 i n order to 
conpl et e for the other protocol s . Thi s path is i denti cal to the nunber of he s sages whi ch 
nust be sent. The right side shows how the path is shorter for PHD 
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hi erarchy \rorth of he s sages pi us one di rect data- del i very he s sage need to be sent for PHD 
as opposed to four traversal s of the hi erarchy warth of "he s sages" for the strict hi erarchy. 

1. 4. 3 OwiersH p 

The concept of owners hi p [ 10] as used i n thi s protocol \\as deri vedfromboth Li [16] [17] and 
Tbt ty [ 27] . An owner of a bl ock i s res pons i bl e for it. Any other node can onl y have a copy 
of the block, \\hi ch can be as ync hr onousl y thrown a\\ay i n or der t o nake roomf or other 
bl ocks . That node can then i nf ormthe rest of the syst emat its leisure vi thout affect i ng the 
correctness of the protocol . Thi s abi 1 i ty to throwa\\ay unneeded copi es of bl ocks w thout 
the gl obal transacti ons re qui red by Hari di and Hager st en results inless tiraeneede din order 
t o i nval i date bl ocks vhen caches are f ul 1 . 

1. 4. 4 ^iippi ng Schem 

This thesis proposes a nappi ng schens desi gned t o nap hi erar chi cal cache coherence pro- 
tocol s onto non- 1 oroi dal k- ary n- cubes . Thi s schens al 1 ows easy cal cul ati on of parent and 
chi 1 d nodes , and is desi gned to reduce comnini c ati on to the area of the net war k cont ai ni ng 
parti ci pati ng nodes . 

1.4.5 LDcalityMdel 

Thi s thesi s i ntro duces a he t hod of descri bi ng 1 ocal i ty f or hi erar chi cal cache coherent prot o- 
col s and i ncorporat es this nEthodinanDdel. The thesi s al so shows how the he t hod can 
be used t o accurat el y predi ct the longest path travel ed per operati on and the nunber of 
nEssages sent per operation. 

1. 4. 6 Eiioedded Mdel 

Thi s the sis alsoi ntroduces a node 1 for descri bi ng the behavi or of PHD as enbedded i nt o 
a k- ary n- cube. Thi s nodel i s based on the wark of Johnson [ 13] , and nodel s appl i cati ons , 
processors , and net war ks . The nodel is used to show that the enbeddi ng wll scale \rell 
for appl i cati ons wthnD derate localityin si tuati ons \\here the nuiiber of cycl es needed to 
process the protocol transactions is snail. 
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1 . 5 Th e s i s Ove r v i e w 

The focus of thi s thesi s is the descri pti on and the nodel i ng of a hi erarchi cal , di rectory- base^ 
cache coherence protocol. Qiapter 2 describes the Protocol for H erarchi cal 11 rectories in 
noderat e det ai 1 and proposes an enbeddi ng of the protocol i nt o a k- ary n- cube. Qiapter 3 
di s cusses sohe of the i ssues i nvol ved i n desi gni ng hi erarchi cal prot ocol s . Qiapter 4 outl i ne; 
the si mil at or used to test and expl ore FHD, thi s chapter al so expl ai ns the si mil at or veri fier. 

Thd node Is v/ere used to study the protocol . The abstract model , vhi ch consi ders the 
protocol runni ng on an abstract hierarchy, is descri bed i n Qiapter 5. Qiapter 6 extends 
this model to showhowthe protocol behaves \\hen enbedded as proposed i n Qiapter 2. 
Qiapter 7 concludes the thesis, outl i ni ng areas of future research. 



Cha p t e r 2 



Protocol Overview 



Thi s chapter provi des a descri pti on of the behavi or of the Protocol for H erarchi cal El rec- 
tor! es . A t abl e listing the exact behavi or of the protocol is 1 ocat ed i n J^pendi x G Thi s 
chapter al so bri efly outl i nes a he t hod of imppi ng a hi erar chy t o a k- ary n- cube. The next 
chapter discusses the issues i nvol ved i n the desi gn of a hi erar c hi cal protocol . 

2. 1 Protocol De scri pti on 

Thi s secti on expl ai ns the ope rati ons used by PHD to ensure consi st ency vhi 1 e coor di nati ng 
the global read, write, and test- and- set operations. Secti on 2. 1. 1 bri efly descri bes the op 
erati on of the protocol . Secti ons 2.1.2 and 2.1.3 outl i ne the deflni ti ons and the not at i ons 
used i n the descri pti on of the protocol . Secti ons 2.1.4, 2.1.5, and 2.1.6 expl ai n the prot occ 
i n raore det ai 1 , descri bi ng the read, write, and test- and- set operati ons , respectively. 

2. 1. 1 Gtervi ew 

PHDsupports three essenti al global prinitives: read, write, and test- and- set . Any nuiiber 
of nodes can have read- onl y copi es of a bl ock i n thei r caches . Tb flnd a bl ock, a node asks 
its parent for a copy. The parent iiust know\\hi ch of its chi 1 d subtrees have copi es . If none 
do, i t f or\\ar ds the he s sage up\\ards . If one does , the read he s sage i s f or\\ar ded t o i t . Bead 
operati ons can therefore be sati sfled 1 ocal 1 y. 

Wi t e operati ons i nvol ve flndi ng al 1 of the copi es of ablockinthe syst emand del eti ng 
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them Chi y the nodes i n the snal lest subtree vhi ch cont ai ns al 1 copi es of the bl ock and the 
wri t e requester are i nvol ve din the write process. Ackno\d edgnent s of del eti on f romal 1 of 
the nodes \\hi ch previ ousl y had copi es areconbined, and an ackno\d edge nessage i s sent 
down to the node requesti ng the wri t e. The owner of the bl ock transfers owners hi p and 
a val i d copy of the blocktothe write requester. The test- and- set ope rati on i s actual 1 y a 
test- and- test- and- set operation; it is i npl enent ed as an opt i ni z e d c onbi nat i on of the read 
and write operations. 

2.1.2 EfefiiitioiB 

There are t w) types of di rectory entri es i n the hi erar chy. The first type, a I eaf 1 evel entry, 
represents an actual block of cached data and waul d be found in the nsnDry of a node 
at the bottomof the hierarchy. The second type, a parent entry, is a di rectory that stores 
i nf ornati on about vhi ch chi 1 d subtrees have copi es of a parti cul ar bl ock. The parent entri es 
correspond to he nor y on sons node of the hi erar chy vhi ch i s not at the 1 eaf 1 evel . 

Every cache entry on a 1 eaf node nay be purgeabl e or unpurgeabl e. Purgeahle entri es nay 
be del et ed at any ti ns. Che copy of every bl ock mist never be del eted; the node desi gnat ed 
as the owner i s res pons i bl e for keepi ng thi s nast er copy unti 1 owner s hi pis passed on. The 
onl y purgeabl e entries are ones \\hi chare in the readable state are yet not owned. Afull 
list of the possible states a of leaf entryis shown i n T^bl e 2. 1. 

A parent entry consi st s of a vector cont ai ni ng t wa bi t s of state for every chi 1 d subtree, 
three addi ti onal bi t s of state, and a poi nt er to the subtree that the current wri t e request , i 
any, vias sent from The entry toH i ndi cat e vhi ch of four possible states each chi Id subtree 
i s i n: invalid, confirned, valid, or waiting. The invalid state nsans that there i s no copy of 
that bl ock i n that subtree. The confirned state nsans that ei ther at 1 east one node i n that 
subtree has a copy of that block or sone\\here below a nessage is propagating up\\ards 
indicating that the bl ock has been deleted. The valid state nsans that an operati on i s 
occurri ng i n the subtree that w 1 1 event uall y nake the subtree confirned. The waiting state 
nsans that the subtree has at least one node \\hi ch i s \\ai ting for the result of a read, and 
that the parent entry needs to send the bl ock down to that node upon recei vi ng the data. 
The \\ai ting state is enpl oyedby the protocol to support read coiibi ni ng. A subtree vector 



2.1. PROTOCOL DES CRI PTI ON 



27 



State 


Descripti on 


readable.yowner 


Entry is readaUe. 
Node is owier. 


readatle.iiDwier 


Bitryis readatle. 
N)de is not owier. 


\ai ti iig_for jead 


N)de is \ai ting for a read to complete. 
N)de is not owier. 


witaHe 


Bitryis witaHe. 
N)de is owier. 


\m ti iig_f or_wi te.nowier-rfl _iitead 


N)de is \aiting for a wite to coi^lete. 
N)de is not owier. 

Invalidation has not yet reached this node. 
N)de nay not respond to read rassages. 


\aitiiig_for wite nowier-rfl ytead 


N)de is \aiting for a wite to coi^lete. 
N)de is not owier. 

Invalidation has not yet reached this node. 
N)de has valid value \liichcanbe distributed 


\m ti iig_f or_wi te.yownerjfl 


N)de is \aiting for a wite to coi^lete. 
N)de is owier. 

Invalidation has not yet reached this node. 
N)de has valid value \liichcanbe distributed 


\aitiTigfor wite nowier ypl ntead 


N)de is \aiting for a wite to coi^lete. 
N)de is not owier. 
Invalidation has reached this node. 
N)de nay not respond to read rassages. 


\aitiTigfor wite nowier ypl yread 


N)de is \aiting for a wite to coi^lete. 

N)de is not owier. 

Invalidation has reached this node. 

N)de has valid value \liichcanbe distributed 


\m ti iig_f or_wi te.ok.yowier j^i 


N)de is \ai ting for a wite; only needs final ack 
N)de is owier. 

Invalidation has not yet reached this node. 
N)de has valid value \liichcanbe distributed 


\aitiTig for wite ok.yowier ypl 


N)de is \ai ting for a wite; only needs final ack 

N)de is owier. 

Invalidation has reached this node. 

N)de has valid value \liichcanbe distributed 


\ai ti iig_f or_wi te_val ue_nowier_yfi _iiread 


N)de is \ai ting for a wite; only needs owiersHp 

N)de is not owier. 

Invalidation has reached this node. 

N)de nay not respond to read rassages. 


\aitiTigfor wite val uejiowier ypl yread 


N)de is \ai ting for a wite; only needs owiersHp 

N)de is not owier. 

Invalidation has reached this node. 

N)de has valid value \liichcanbe distributed 


\aitiiig_for_tas 


(Full set corresponding to \aiting_for wite set). 



Tkh\ e 2. 1: The possi bl e states of a 1 eaf cache entry. 
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Oonbi nati on 



Itescri pti on 



vO_wO_cX 



All subtrees are either confirrmd or invalid. 



vX_wO _c 



All subtrees are either valid or invalid. 



vX_wX_c 



All subtrees are valid, waiting, or invalid. 



vX_wO _c X 



All subtrees are valid, confirned or invalid. 



vX_wX_c X 



All subtrees are valid, waiting, confirnBd ot r 



TV 



alid. 



Tkh\ e 2. 2: The possi bl e conbi nati ons of states i n the subtree vector of a cache parent entry. 

can onl y have cert ai n coiibi nati ons of these states , shown i n Tkh\ e 2. 2. 

Al 1 parent entri es are narked as ei ther shared or exclusive. An excl usi ve entry i ndi cat es 
that all copies are wthin the current subtree. All entries at the top level node of the 
hierarchy, by definition, are exclusive. A di rectory entry on a node \\hi ch i s narked as 
shared, on the other hand, i ndi cat es that there nay be a copy out si de of the subtree rooted 
at that node. 

Aparent entry nay be I ocked or unlocked. If an entry is locked, then all ihbs sages \\hi ch 
TOsh to access it mist \\ai t unti litis unl ocked. Ms sages vhi ch unl ock an entry are of 
course not re qui red to \\ai t . Diri ng a wri t e, vhen a parent entry is locked, there are tw) 
nore possible state nodifiers anode night have: on-request-path and writer_acknovi edged. 
If the node cont ai ni ng the parent entryis locatedonadirect path bet \reen the wri ti ng node 
and the top of the write, it is on_requestjpath. If the parent entry i s on the request path of 
the write, the final state, writer_acknovi edged, indicates ■whether or not the wri ti ng subtree 
has ackno\d edged the wri t e i nval i dati on. Tkh\ e 2. 3 shows these states. 

There are ei ght een di fie rent ihbs sages used by the protocol . They are listedin Tkh\ e 2. 4, 
and ™ 1 1 be expl ai ned as they are used. 

2.1.3 Notation 



Throughout thi s chapter, di agrans of trees wll be shown. These trees are vi rtual trees , 
and do not actual 1 y exi st on the t ypi cal ar chi t ecture PHDw 1 1 be napped to. The nappi ng 
is descri bed i n Secti on 2. 2. Except \\here noted, the figures onl y consi der a si ngl e cache 
bl ock. 
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State 



Itescri pti on 



S_U_NOP_NGA 



Eitry 
Eitry 



is al so contai ned i n another subsyst em( s/iarec?) , 
i s unl ocked. 



E_U_NOP_NGA 



Eitry 
Eitry 



i s onl y contai ned i n thi s subsyst em( exclusive) . 
i s unl ocked. 



S_L_NOP_NGA 



Eitry 
Eitry 
Eitry 



is al so contai ned i n another subsyst em( s/iarec?) . 

i s Z ocked. 

i s not 1 ocat ed on path f romwri t er to top node for tht wri t e. 



S_L_YOP_NGA 



Eitry 
Eitry 
Eitry 
Eitry 



is al so contai ned i n another subsyst em( s/iarec?) . 

i s Z ocked. 

is locatedon path f romwri t er to top node for the wri t e. 

has not yet recei ved ackno\d edge f romthe wri ti ng subtree. 



E_L_YOP_NGA 



Eitry 
Eitry 
Eitry 
Eitry 



i s onl y contai ned i n thi s subsyst em( exclusive) . 

i s I ocked. 

is locatedon path f romwri t er to top node for the wrl t e. 

has not yet recei ved ackno\d edge f romthe wri ti ng subtree. 



S_L_YOP_YGA 



Eitry 
Eitry 
Eitry 
Eitry 



is al so contai ned i n another subsyst em( s/iarec?) . 
i s Z ocked. 

is locatedon path f romwri t er to top node for the wr 
has recei ved ackno\d edge f romthe wri ti ng subtree. 



te. 



te. 



E_L_YOP_YGA 



Eitry 
Eitry 
Eitry 
Eitry 



i s onl y contai ned i n thi s subsyst em( exclusive) . 
i s I ocked. 

is locatedon path f romwri t er to top node for the wr 
has recei ved ackno\d edge f romthe wri ti ng subtree. 



Tkh\ e 2. 3: The possi bl e states of a cache parent entry. 




A node with no copy of the data. 

A node with a valid copy of the data. 

A node requesting an operation. No copy of the data. 



Figure 2.1: This figure explains the synbols used throughout the chapter. Nate that a 
node TO th a "val i d" copy of the bl ock i s an i npreci se descripti on, basi cal 1 y i npl yi ng that 
the node, i f 1 eaf , has a copy of the bl ock i n a readabi e or wri t abl e state. If a grey node is 
not a 1 eaf node, i t i s assuned to have at 1 east one subtree i n the confirned state. 
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Ms sage 


Itescri pti on 


find J o\\e s t _c omiDnJ" or _r e ad 


Look up\\ards for nearest node to th value 


redi rect edJindJ o\\est _coimDnJ"or_r 


;d!d)ok agai n; f ai 1 ed i n current subtree. 


read 


Wl k down\\ar ds t o a node to t h val ue . 


findJo\\est _comH)n_for_\vri te 


Look for 1 ca of al 1 nodes to th val ue. 


1 ock 


Lock all nodes bel owto th val ue. 


ack 


All copies beloware invalid. 


ackl 


All copi es bel owexcept to-I ter ' s are i nv 


throw ng^'way 


Subtree bel owi nval i d; once was confirnEd 


c hange _t o _e xc 1 us i ve 


INbde is least coimon ancestor of all copi 


findJo\\est coimDiiJor tas 


Look upwards for nearest node to th value 


redi rect edJind 1 o\\est coinH)nJ"or t 


ijjook agai n; f ai 1 ed i n current subtree. 


c onfir mval ue 


Subtree belowhas got ten a copy of value. 


read.dat a 


Level =0: Send data di recti y to reader . 
Level > 0: Send data to \\ai ti ng subtrees 


unc onfir mval ue 


Subtree belownownot c onfir he d, not i nva! 


read_tas 


Wl k doTOi\\ar ds t o a node to t h val ue . 


wr i t e _ok 


]Nb other copi es left in tree. 


s_-wri te_o-wn 


Ownership transfer he s sage for TO"ites. 


tas_f ai 1 ed 


Tfe s t - and- set failedininitial read stag 



id. 



es , 



id. 



Tkh\ e 2. 4: The he s sages sent by the protocol 
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confirm 



locate 




send a copy directly 



Figure 2.2: This di agramshows the three phases \\hi ch occur during a read operation. 
INbde 5 \\ant store ad X, soit sends he s sages tolocate X Thi s first phase, 1 ocati ng a bl ock, 
fini shes ■when INbde 6 i s i nf orned that INbde 5 ■want s a copy. At that poi nt , phase t ■wo st art s , 
i n \\hi ch a copy i s sent di recti y to INbde 5. Finally, i n phase three, c onfirnat i on t hat the 
val ue arri ved i s sent to INbde 2 and then f romlNbde 2 to INbde 1. 



At the 1 eve 1 of det ai 1 of the figures i n thi s chapter, nodes nay be i n one of three states 
as s hown i n Fi gur e 2.1: invalid, valid, and requesti ng an operation. These states apply 
i ntui ti vel y to both 1 eaf and parent entri es . INbt e that the "val id" state for a parent node is 
HDst si nil ar to havi ng a confirned subtree. 



2. 1. 4 Read i^ 

Aread to a 1 ocal 1 y cached bl ock occurs irnnediately. Qiareadniss, ho\\ever, a three- 
phase ope rati on nust be perf orned, as sket ched i n Fi gure 2.2. The first phase 1 ocat es the 
nearest bl ock vhi 1 e si mil taneousl y updati ng the states i n the hi erarchy. The second phase 
sends a copy of the block directly to the requesti ng node. The thi rd sends a confirnati on 
that the node has actual 1 y recei ved the copy. There are tw) possible conpl i cati ons to a 
read. Rrst, a write nay be going on at the sans tins. Second, the copy chosen to be 
repi i cat ed nay be del eted 1 ocal 1 y before the read request reaches it. Both of these probi ens 
are handl ed by the protocol . 

Anode \\hichwshes tolocate a block for a read sends a find2owest-COTmvn_for_read 
nEssage t o i t s parent . If the parent has no record of the bl ock, it sends the sane nsssage 
up, unti 1 a node i s found that has it. Iftheblockexists, the nsssage travel i ng up\\ards to 1 1 
event uall y arri ve at a node vhi ch kno\\B \\here the blockis. If the block does not exi st , the 
protocol m\\ allocate it aut onati cal 1 y or signal an error, \\hi chever i t has been configured 
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Fi gure 2.3: The states that a 1 eaf node can enter duri ng a read. 

to do. 

If the entry at that node is unlocked, and at least one subtree is confirnEdbut none are 
valid, the node mist update its vector of \\ho has the bl ock and pi ck one of the confirned 
subtrees to send the read he s sage down to. If any of the subtrees are val id, the requesti ng 
subtree i s narked as \\ai ti ng, and the read process suspends here. Wen the val i d subtrees 
are changed to confirnEd, indicating that they have finished the read, the values are sent 
downtoall \\ai ti ng subtrees . This nechani smsupport s read conbi ni ng. 

Wen a non- 1 eaf node recei ves a read he s sage, i t changes the state of its entryto shared, 
s i nc e t he re que s t mis t have c one f r omout s i de of t he s ubt r e e it he ads , and f or war ds t he read 
message down t o\\ards the confirraBd subtrees and 1 eaves . Wen aleaf entryina readabi e 
or writable state recei ves a read, it sends apurgeable copy (usi ng the read-data nss sage) 
directly to the requesting node. Wen the requesting node receives the value, it sends a 
confirmvalue nss sage upwards to its parent, toconfirmthat it has recei ved a copy of the 
bl ock. 

Oonpl i cati ons to a read can occur vhen a node del et es a bl ock vhi ch some other node 
is tryi ng t o read. Thi s del eti on nay cause a read message to reach a node vhi ch no 1 onger 
knows about the block. In this case, a r edi red ed-find J owes t-COTmvn_f or _read nss sage is 
sent up\\ards t o find a di fferent record of the block. 

In al 1 of the cases descri bed above, if a node is ever reached \\hose entry for the bl ock 
bei ng read i s 1 ocked, the read i s t enporari 1 y hal ted. Thi s hal ti ng perni t s the seri al i zati oi 
of r e ads and wr i t e s . 

Fi gure 2.3 shows the states that can occur i n a 1 eaf node duri ng a read. Wen a node 
TOshes t o read a bl ock \\hi ch i t has not 1 ocal 1 y cached, it enters a wait ing.f or _read state 
and sends a find2ovest-COTmvn_for_read nessage to its parent. In a nornal read case. 
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send a copy directly 



Fi gure 2.4: Thi s di agraini 1 1 ustrat es how re ad conbi ni ng occurs . As shown inthe left tree, 
INbde 5 request s a read j us t as i n Fi gure 2.2. At any tine bet\reen the locate nessage 
reachi ng INbde 2 and the confirnmEssage reachi ng i t , INbde 4 al so deci des to read the sane 
value, and sends a locate nessage to its parent, INbde 2. INbde 2 records that INbde 4 is 
\\aitingfor that block. As shown i n the ri ght tree, \\hen the confirnati on nessage from 
INbde 5 reaches INbde 2, INbde 2 sends the data fromthe read to INbde 4, \\hi ch responds 
™th a c onfir nat i on. INbte that only one confirnation is sent fromlNbde 2 to INbde 1; 
confirnati on i s sent by INbde 2 onl y after receiving c onfir nat i on fromlNbde 5. 



the node to 11 be i nf orned by a read-data nessage of the value, and to 11 then enter the 
readabl e-uowner state. If, on the other hand, a wri te start s before the read conpl etes , the 
node nay receive a lock nessage before the newvalue. The node blocks the write from 
conpl eti ng by not ackno\d edgi ng the I ock unti 1 it receives the newval ue and conpl etes its 
read. 

If several nodes si mil t aneousl y try to read a block vhi ch i s not al ready to del y di s- 
tributed, the nessages toH be conbined. For exanple, if three leaf nodes TOththe sane 
parent trytoreadblockX, theyTOll all send find J ovoest -commri-f or _read nessages to thei r 
parent . Wen the first nessage reaches the parent , it to 1 1 update its entry to record that 
the node \\hichsent the first nessage, INbde 1, has a valid (but not confirned) copy, and 
f or\\ard up the request . The second nessage toarriveTOll result in INbde 2's state be ing 
changed to \\aiting. This tine, of course, the parent node does not for\\ard the request 
up\\ards . The thi rd nessage toarriveTOll result ina change of INbde 3' s state to \\ai ti ng. 
Wen the value of JQ s sent to INbde 1, it to 11 send that data to the parent as part of 
the confirmval ue nessage. The parent to 1 1 then send the data t o al 1 of i t s subtrees i n the 
viEii ting state, inthis case INbdes 2 and 3. Asi nil ar exanpl e of read conbi ni ng i s i 1 1 ustrat ed 
i n H gure 2. 4. 
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write copies deleted 



Fi gure 2.5: Thi s is a sketchof the wite process. INbde 5 ■wants towriteX, so it sends 
HESsages to locate X Wen the locate nessages reach the lo\\est connon ancestor of 
INbde 5 and al 1 nodes that knowabout X(inthis case INbde 1), lock nessages are sent down 
to every node vhi ch has X Each 1 eaf node recei vi ng a 1 ock he s sage deletes its copy and 
sends an ackno\d edgnent up\\ards. The owner (6) additionally sends a copy of Xto 5. 
Wen all of the ackno\d edgnent s have been col 1 ected, INbde Sis sent anessage. INbde 5 
naynowupdate X 



2.1.5 Writii^ 

Agl obal wri te i nvol ves findi ng al 1 of the copi es of a bl ock i n the syst en^ 1 ocki ng then^ 
deleting then^ transferring ownership and the current value to the newowner, and then 
perf orning the actual wri t e. Thi s process is showninFi gure 2.5. 

Several consecuti ve wri t e request s f roma si ngl e node to a parti cul ar 1 oc at i on can be 
fulfilled quickly and easily. As soon as a node has written to a block once, it has sole 
ownership and control over that block, and can thus perf ormconsecuti ve reads or writes 
locally until another node requests a copy. 

Anode wshingtowrite toablock vhi chi t does not have a writabl e copy of mist first cre- 
ate an entry i n the state miiting.for.vrite.novner_npl_nread, or nodi f y an exi sti ng entry to 
be in the waiting-for_wite_nowier_npl_yreadoT imiting_for_iirite_yowier_npl state, as ap- 
propriate. The 1 ocati on phase then begins. The node sends a findJ owest-COTmvn_for_vrite 
HESS age to the node above it i n the hi erar chy. If that node has no record of the bl ock, it 
sends the sarae he s sage up. 

The locate phase continues until the lo\\est comiDn ancestor (lea) of the block is 
reached. The calculation of the lo\\est comiDn ancestor for a block considers all leaf 
nodes cont ai ni ng the bl ock and the node requesti ng the write (see Fi gure 2.6). INbt e that 
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Figure 2.6: The least c omiDn anc e s t or for a bl ock depends not only upon the block, but 
al so upon \\here the writer is located. The 1 ca f or bl ock y, cached i n INbdes 5 and 6, from 
the point of viewof INbdes 5, 6, or 2, is INbde 2. Itomthe point of viewof all of the other 
nodes , the 1 ca for y i s INbde 1. The 1 ca for bl ock x, f romany node' s poi nt of vi ew, i s INbde 
1. The defini ti on of the lea for a bl ock f romthe perspecti ve of a node Ni s that the 1 ca i s 
the first node \\hose entry is tagged excl usi ve i n a path st arti ng f romiVgoi ng up to the 
node at the hi ghest 1 evel of the hi erarchy. INbdes 1 and 2 1 abel bl ock y as excl usi ve. INbdes 
2 and 3 1 abel bl ock x as shared. INbde 1 1 abel s bl ock x as excl usi ve. 



the 1 ca node is the hi ghest node i n the hi erarchy that i s i nvol vedinthewrite. Anlcanode 
recei vi ng a findJ ovEst -COTnrvnJ or_iiri t e nessage locks its entry, si gni f yi ng the begi nni ng 
of the 1 ock phase and ensuri ng the seri al i zati on of wri t es . It then sends down I ock nes sages 
to every node vhi ch has a copy of the bl ock. 

Mst nodes recei vi ng the I ock nessage to 1 1 have copi es of the bl ock he\ Ji^i-ocked. 
leaf nodes wthrecords of the blocklocktheir entries, and f or\\ard down the I ock nes sages 
t o al 1 those nodes bel owtheni\\hi ch have the bl ock. 

M.\ of the leaf nodes wth copies of the block toH receive lock nessages. Those wth 
purgeabi e copi es j ust erase the copi es and send an ack nessage up i rniedi atel y. The ol d 
owner of the bl ock to 1 1 have an unpurgeabi e entry. Thi s owner node first sends a copy of the 
bl ock di recti y to the node requesti ngthe write, thendeletes its copy and sends up an ack 
nessage. The nai n purpose of the copy nessage i s t o transfer the owners hi p of the bl ock 
and t o gi ve the wri t er the ol d val ue to di stri but e if necessary to the reads serializedbefon 
the wri t e. Thi s prevent s a deadl ock si tuati on, vhi chwll be described nore f ul 1 y 1 at er. 

The node that is inthe state waiting J or.vrite m\\ also have an unpurgeabi e version 



^If a node has deleted a block and the information that this has happened is still propagating upwards, 
sone nodes nay receive a lock nessage but not have a record of the block. In this case they iraiediately 
send up an acknovd edgnent of del eti on. 
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Fi gure 2.7: The fini t e state nachi ne descri bi ng a 1 eaf node entry duri ng a wri t e. These 
states are al 1 approxi nat e; the exact transi ti ons are descri bed i n Tkh\ e G 5. 



of the bl ock. Thi s is the node that requested the write. If twjwrites v/ere requested at 
approxi nat el y the sans ti ns, the one vho the Z ock nsssage records as the wri t er is the 
one that wan the race. The other wri t e to 1 1 be \\ai ti ng on a 1 ocked entry sonE\\here. The 
™ nni ng node sends up a special ackno\d edgnent , ackl, i ndi cat i ng that it is on the path of 
t he wr i t e . 

The streans of ack and ackl he s sages signal the conbining phase of a write. This 
phase is used to ensure that every copy of the bl ock i s del eted before any nodi fie at i ons 
are perf orned. Each parent node col 1 ect s ack and ackl he s sages unti 1 it receives responses 
f romal 1 of i t s i nvol ved chi 1 dren. If the parent node is on the di rect path bet\\eenthe writing 
node and the lea node, i t sends up an ackl as soon as there i s onl y one subtree bel owi t to th 
a copy, audit has al ready recei ved an acfci froma subtree. The si ngl e renai ni ng subtree 
cont ai ns the node vhi ch requested thewrite. If the parent node is not on the wri t e path, 
i t vjai t s unti 1 it receives ack he s sages f romal 1 subtrees directly bel owi t vhi ch had copi es , 
and thendeletes its record of the block. The 1 ca node for the bl ock to 1 1 be on the path 
for deletion. Wen i t receives the last ackno\d edgnent , it sends a wit e_ok nss sage doym 
to the node requesti ng the wri t e and unl ocks its cache entry. The vrite_ok he s sage travel s 
through all nodes that \rereonthe write path, unl ocki ng themas it descends. 

The node requesti ngthe write toH receive twa final he s sages , i n an i ndet erninat e order. 
Che is the s_urife_oun he s sage, \\hi ch contains the value of the data and perforns the 
owners hi p transfer. The other is the write. ok he s sage, \\hi chi ndi cat es that al 1 other copi es 
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on the syst emhave been del et ed. Chi y after receiving both he s sages does the node change 
the state of the entry to writabl e and perf ormthe wri t e. Fi gure 2. 7 shows the fini t e state 
nachi ne representi ng thi s sequence. 

In the protocol , any read i n progress vhen a write reaches a certain poi nt wi 1 1 conpl et e 
before the write does. In parti cul ar, when a I ock he s sage reaches a 1 eaf node i n the wait- 
ing-foT-read state thelockwill be del ayed at the node unti 1 a val ue i s actual ly sent there. 
After the read, the bl ock i s purged fromthe cache, and an acknowi edge i s sent . In order to 
avoi d deadl ock, the protocol al ways al 1 owb at 1 east one node to di stri but e the ol d val ue on 
denand. Before the owner shi p transfer, the owner wi 1 1 have the val ue for di stri but i on; after 
the owner shi p transfer the writer will have and di stri but e the val ue. Beads at t enpti ng t o 
conpl et e duri ng the 1 at er stages of a wri t e often end up bei ng sent to the wri t er. 

2. 1. 6 ASjnchroii zati on fti liti ve 

Al though the read and writeprinitives whi ch operate on shared he nor y ensure consi st ency, 
they do not provi de a si npl e nsthod f or synchroni zati on bet ween nodes . W have therefore 
included the test- and- set (tas) instruction. This prinitiveis included for conpl eteness , 
and coul d be i npl ensnt ed better by a vari ety of nsthods [ 11] [ 20] . 

The TAS i s a conbi nati on of a read and awrite. RrstareadisperfornEd, up to the 
point where a copy of the val ue i s located. If the copy i s non-zero, the TAS fails, and a 
t as _f ailed nessage is sent to the requesti ng node (see Fi gure 2.8). If the copy' s val ue i s 
zero, the write phase begins. The "write" continues just until the requesting node would 
be about to perf ormthe write. Atthis poi nt , the val ue i s agai nc he eked. If it is non- zero, 
no val ue is written. If it is zero, the TAS conpl etes successful ly. Thi s second check mist b( 
perf orned i n order to ensure the at oni city of the test- and- set . Adi agramof a successful 
TAS isshowninFi gur e 2 . 9 . 

Al though the test- and- set prinitive was desi gned wi th barri er synchroni zati on i n nind, 
it is still not as good as a nechani smspeci al 1 y desi gned for barri er synchroni zati on. Synchro- 
ni zati onusi ng the provi ded t est- and- set prinitive does apreli ni nary read before at t enpti ng 
to gai n owner shi p of the test- and- set vari abl e i n order to reduce us el ess thrashi ng. Tb per- 
f orma barri er synchroni zati on, however, every node will still have to gai n owner shi p of the 



38 



CHAPTER 2 . PROTOCOL OVERVI EW 



locate 



U unpurgeable 




tas failed 

Fi gure 2.8: The di agrainsho\\B the steps of a test- and- set vhi chfails in phase one. INbde 
5 tries to per for ma TAS on X INbde 5 does not find XI ocal 1 y, and sends a 1 ocat e he s sage 
up to INbde 2. INbde 2 knows \\here a copy is, so sends a nessage down to INbde 4. INbde 
4exanines X, and finds out that Xis non-zero, inplyingthat the test- and- set has failed. 
INbde 4 therefore sends a he s sage to INbde 5 t el 1 i ng i t that the tas has f ai 1 ed. 



tas copies deleted 
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Figure 2.9: The diagramshows the steps of a test- and- set whi ch conpl et es successfully. 
The first part i s the sane as i n Fi gure 2. 8 and i s not repeated here. After INbde 4 verifies 
that Xis 0, it begins the sane steps as wduI d happen i n a wri te. The 1 ca node for X 
(INbde 1) i s found. It sends 1 ock hes sages t o al 1 nodes whi ch have copi es of X Those nodes 
del et e thei r copi es , and send acknowl edgnent s upwards . After both the val ue and the final 
acknowl edgnent are sent tolNbde 5, it checks to nake sure Xis still 0. If so, it sets X 
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bl ock at sonE poi nt . Ch a nachi ne such as the J- Michi ne, the separate he s sage f aci 1 i t y 
can be used by an appl i cat i on to bui Ida nore effti ent barri er synchroni zati on. 

2.2 Physical Layout 

The hi erarchy is napped to a physi cal nachi ne in such a \\ay as to real i ze hi erarchi cal 
1 ocal i ty as physi cal 1 ocal i ty. The nappi ng i s al so desi gned to spl i t the address space so as 
to increase bandw dth and prevent bottlenecks at higher levels of the tree. The napping 
is desi gned to w)rk for all k- ary n- cubes , although the protocol nay not perform\rell on 
configurations such as hi gh- di nsnsi onal cubes. 

Each processor stores part of the global address space. The locations of every block 
are stored i n a hi erarchi cal di rectory, f orning the vi rtual tree describedinSection2.1. . 
vi rtual tree i s conposed of vi rtual nodes , each of vhi chnay be napped onto several physi cal 
processors . Thi s nappi ng al 1 ows us to forma di fie rent physi cal tree traversal patterns : one 
for each set of addresses. 

2.2.1 H erarchi cal D rectory 

A di rectory records \\hi ch nodes have copies of blocks. In a nul ti pi e 1 evel systen^ every 
parent node at level 1 knows \\hi ch of its child nodes have copies of a bl ock. Every parent 
node above level 1 stores vhi ch of its c hi 1 d nodes are the root s of subtrees cont ai ni ng copi es 
of a bl ock at thei r 1 eaves . Tblocate ablock that i s not stored 1 ocal 1 y, a node sends an 
i nqui ry vhi ch to 1 1 travel up\\ards unti 1 a copy i s found. 

In order to increase bandwdth, the directories of the virtual nodes at every level are 
spl i t onto nany physi cal processors . Thi s splittingis shown i n R gure 2.10. Each 1 eaf 
node i s napped di recti y onto a uni que physi cal processor. The parent (non- 1 eaf ) nodes are 
di stri but ed equal 1 y onto al 1 processors of the nachi ne, whi 1 e nai nt ai ni ng 1 ocal i t y. The top 
node of the tree is di stri but ed onto al 1 nodes of the nachi ne. The nappi ng i s al so desi gned 
t o pronot e 1 ocal i t y: every physi cal processor stores part of a node f romevery 1 evel . This 
i npl i es that sons request s can traverse the entire tree whi 1 e st ayi nglocal to a processor. 

Fi gure 2.11 shows a hi erarchi cal di rectory enbedded i nt o a two- di nensi onal nssh net- 
work. The hi ghest 1 evel of a vi rtual tree consi st s of a si ngl e node. Its four chi 1 dren are th 
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A physical processor 



A virtual node 



Fi gure 2.10: The vi rtual address tree is split toincrease bandw dth. In thi s case, a 3 1 eve 1 
radi x 2 tree is napped ont o a 4- ary 1- cube. Each vi rtual 1 eaf node is stored on a uni que 
physi cal processor. The first- 1 eve 1 parent nodes are each spl i t onto t w) physi cal processors 
(f orning sub- ZiTies) . The second-level parent nodes (in this case the root node) are split 
onto four physi cal processors (f orinng a sub- 1 i ne of doubl e the size of the first- 1 eve 1 ones ) . I 
a k- ary 1- cube, the parent of a 1 eaf node to 1 1 be 1 ocat ed i n the sans t wa- proces sor sub- 1 i ne 
as 1 eaf node i t sel f . The grandparent of a 1 eaf node w 1 1 be 1 ocat ed i n the sane four- processor 
sub- 1 i ne as the 1 eaf node i t sel f . For every addi ti onal 1 evel i n the radi x t wa tree, the nuiiber 
of processors needs to be doubled. 
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Figure 2.11: Aconceptual viewof atw), three, and four level tree. Each group at a level 
becoHES a si ngl e node at the next hi ghest 1 evel . 



four 1 evel 2 nodes vhi ch conpose that si ngl e node. The four chi 1 dren of a 1 evel 2 node are 
the four level 1 nodes, and of a level 1 node are four 1 evel nodes. Level nodes correspond 
to leaves of the tree, and are physical processors. Each virtual parent node can contain 
i nfornati on about any bl ock, yet each of the physical processors conposi ng a parent node 
can onl y hoi d sons predet ernined subset of the bl ocks , based on the bl ock addresses . Thi s 
nappi ngresults in physi cal 1 ocal i ty, because any he s sages travel i ng in the hi erarchy to 1 1 
al -nays stay -ni thi n sub- cubes . 

The hi e rare hi cal di rectory can al so be vi e\red as consi sti ng of mil tiple trees. As an 
exanpl e, consi der the nappi ng of a vi rtual 3 1 evel , radi x4treetoa physi cal 4- ary 2 cube 
shown i n R gure 2.12. The col 1 ecti on of nodes that can store a parti cul ar address f orns a 
conpl etetree. Inthis exanpl e, si xt een di fferent trees are f orns d, each root ed at a di fferent 
processor. Because the trees for different addresses are different, there is no bot tl eneck at 
the "top node" of the hierarchy. 
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Fi gure 2. 12: Tiees enbedded i nt o a 2- di nensi onal gri d. Oil y t w) out of si xt een are shown. 



2. 2. 2 ]Ndpii i^ Bincti on 



The napping function is used to calculate the node nunber of the parent (or child) of 
a node, given an address, a level in the hierarchy, and the current node nunber. This 
parti cul ar nappi ng f uncti on onl y warks for nachi nes \\hose radi ces are po\rers of t w). 

A gl obal address consists of tw) parts. The nup part mist encode the infornation 
necessary for the nappi ng f uncti on to operate, such as a gl obal processor ID The key part 
is used to disti ngui sh anong addresses to th i denti cal nap part s , such as 1 ocal addresses 
on a si ngl e processor. There are no restricti ons as to how the nap and key part s nay be 
c onbi ne d t o for ma gl obal addr ess. 

Any node can store any bl ock at the 1 eaf 1 evel . Tb cal cul at e the parent for that bl ock, 
replace sons part of the current node nunber with the nap part. For exanple, on the 
J-Michine, whi ch has a three- di nsnsi onal nssh network, take the low bits of the node 
nunber' s three coor di nates and re pi ace these three bits withthe correspondi ng three bi t s 
f romthe gl obal address. This strategy i npl i es that the highest level nodes will store only 
bl ocks whose nap part of thei r addresses equal s thei r node nunber. Fi gure 2. 13 illustrates 
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Fi gure 2.13: Thi s figure denonst rates the nappi ng f uncti on used for a 3- di nensi onal nesh. 
The three node nunbers indicate \\hi ch nodes can store address H. ^coul d be stored on 
any 1 eaf (1 evel 0) node. Tb cal cul at e the 1 evel 1 node that ^coul d be stored at re pi ace the 
1 owthree coordi nat e bi t s , one f romeach di nensi on, to th thei r correspondi ng val ue f romff 
Tb cal cul at e the 1 evel 2 node, re pi ace the next hi ghest three coordi nat e bi t s , etc. 



the J- Mchi ne nappi ng f unc t i on. 

This nappi ng f uncti on TO 1 1 keepnsssages confined to physi call y snal 1 areas \\henever 
possi bl e. AnEssage bei ng sent f romthe leaflevel to the first level toH by defini ti on have a 
desti nati on sonE\\here to thi n the ei ght (nore gene^^lnlc94e2cube vhi ch i ncl udes the 
sender. Assuning bi di recti onal 1 i nks , the farthest such a nsssage toduI d need to travel i s 
three (n) hops . Mne general 1 y, the farthest a nessage to 1 1 have to travel to comiuni cat e 
bet \reen 1 evel s i and i + 1 i s A-Eops. Qi average, assuning randomdesti nati ons , the 
di stance i s on|y2' hops . There to 1 1 be nore di scussi on of thi s enbeddi ng i n Chapter 6. 



2.3 Su mma r y 



This chapter described the operations of PHD PHDsupports cache coherent read, TO"ite, 
and test- and- set operations. Bead requests are sati sfied i n the snal lest subtree containing 
both the requester and a copy of the requested bl ock; onl y three sets of nes sages are sent 
up or doTOi t hat s ubt r e e . Wi t e r e que s t s ar e c online d t o t he s ubt r e e c ont ai ni ng t he 1 o\\e s t 
cornnon ancestor of the requester and al 1 copi es of the requested bl ock; four sets of nes sages 
traverse the hi erarchy, t wjof vhi ch fan out t o al 1 nodes to thcopi es . The test- and- set request 



44 CHAPTER 2. PROTOCOL OVERVI EW 

i s i npl enent ed as an opti ni zed coiibi nati on of read and wri t e request s , and i npl enent s a 
test- and- test- and- set operation. 

Thi s chapter al so descri bed a nappi ng of PHDt o arbi trary k- ary n- cubes . The nappi ng 
transi at es hi erarchi cal 1 ocal i ty i nt o physi cal 1 ocal i t y. The nappi ng al so st ati cal 1 y sprea 
higher-level tree nodes onto nany physi cal processors, in order t o i ncr ease bandw dth and 
prevent bottlenecks at the top of the tree. 



Cha p t e r 3 



Protocol Issues 



Mny deci si ons mist be nade i n the desi gn of a coirpl ex system These deci si ons often 
involve tradeoffs bet \reen space, tine, and conpl exi ty. This chapter discusses sone of the 
tradeoffs that v/ere nade i n the desi gn of the Protocol for H erarchi cal El rect ori es as \rel 1 
as the consequences of these decisions. 

Sect i on 3. 1 exanines those tradeoffs i nt ended to i ncrease the paral 1 el i smi n conpari son 
toother hierarchical protocols by i ncreasi ng the asynchrony. Secti on 3. 2 consi ders snail, 
easi 1 y change abl e desi gn deci si ons that further opti ni ze the per for nance of the protocol . 

3 . 1 Pa rallelism 

PHD\\as desi gned to reduce the seri al i zati on of protocol acti ons by i ntroduci ng paral 1 el i sm 
i n the sati sf acti on of request s . Par all el i zi ng a probi en^ ho\\ever, often nakes the probi em 
nore conpl ex. Mny of the choices nade in the protocol design therefore significantly 
increased the conpl exi ty of the protocol. Wether there is a comiEnsurate decrease in 
1 at ency i s an open i ssue to be studi ed. 

3. 1. 1 Bctra Taversal s of the Herarch^ 

Extra traversal s of the hi erarchy provi de i nf ornati on to a protocol . Avoi di ng extra traver- 
sal s of the hi erarchy i nc leases both the state necessary to support a protocol and the com 
pi exi ty of a protocol . There are t wa spec i fie cases of thi s tradeoff i n PHD one i n the read 
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DDM 



PHD 



NAI, DHP 



Fi gure 3. 1: Thi s figure coirpares the nunber of traversal s of the hi erarchy needed for a read 
request for the four different protocols: EDVf FHD, NAI, and EHP. The black node is 
perf orning a read request . The grey nodes have copi es of the bl ock bei ng requested. 



request nechani sn^ and one i n the asynchronous i nvali date nechani sm Wbri efiy conpare 
four di fferent sol uti ons to thi s tradeoff, three of vhi ch are part of exi sti ng protocol s , 



DIM The ECMprot ocol [ 12] requi res nore traversal s of the hi erarchy than do any of 

the other protocol s . Thi s requi rensnt is reasonabi e gi ven the assunpti ons of that proj ect : 

they propose t o i npl enent thei r protocol on a bus- based syst en^ \\here the hi erarchy i s 

fixe d i n har d\\ar e and cannot be ci rcunrent ed. Diri ng a read request , four traversal s of 

the hi erarchy occur (see Fi gure 3.1): first up, to find a node vhi ch knows \\here a copy i s , 

then down, to a node with a copy, then back up and down through the net work, updating 

the i nt eri or di rect ori es as the read occurs . 

There is onl y parti al 1 y asynchronous i nvali dati on i n the ECMprot ocol . In order to 
di scar d a bl ock, a node mist i ni ti at e a transacti on whi chcarries the data. Thi s transacti on 
wi 1 1 conti nue to propagate upwards unti 1 at 1 east one other copy of the bl ock i s found. Thi s 
syst empr event s the protocol f romdel eti ng the 1 ast copy of a bl ock. The transacti on iiust 
carry the val ue wi th i t , unl i ke i n the other protocol s , i n or der to be sure that the val ue i s 
preserved. 

This protocol differentiates four read states in the hierarchy for a subtree: invalid, 
reading, answering, and val i d dat a. These states are updated as the read travel s twi ce up 
and down the hi erarchy, and provi de f ul 1 i nfornati on to the protocol as to the exact stage 
of a read. The val i d data state onl y i npl i es that the data has been sent i nt o the subtree. 
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not that the data is still there. 

HD The PHDprotocol requires one fe\rer traversal of the hi erarchy f or a read request 
than does the EDVEprot ocol . FHD, as i 1 1 ustrat edi n Fi gure 3.1, al so routes the read request 
up and down through the hi erarchy, but then sends a copy of the val ue di recti y through the 
net war k to the node requesti ng the read, and then updates the hi erarchy by a confirnati on 
sent onl y up\\ar ds . 

PHD can al so conpl et el y asynchronousi y di scard bl ocks . Thi s feature al 1 ows nost nodes 
to qui ckl y di scard cache entries \\henever the caches are full. IIil i ke the EDVEprot ocol , 
the val ue i s not carriedinthe di scard he s sage. The use of a speci al owner node guarantees 
that all nodes wll not si mil t aneousl y di scar d thei r copies. The owner, \\hi ch i s defined as 
the last node to write to ablock, cannot asynchronousi y di scard its copy. If a parti cul ar 
node i s the onl y node to wri t e t o nany bl ocks , i t s cache w 1 1 eventual 1 y fil 1 up. PHD can 
be extended to solve this probl emby addi ng a he s sage \\hi ch request s sons other node 
wri t e the val ue (freeingit f romthe f ul 1 node' s cache) . Thi s sol uti on i ntro duces conpl i cat ec 
1 oad- bal anci ng i ssues not addressed by thi s thesis. 

The conbi nati on of these tw) features i ntroduces coirpl exi ty to the protocol . ^ though 
i nvali dati on i s now si npl er , because the val ue i s not carriedinthe deletion he s sage, the 
probl emof the owner capaci ty over fiow has been i ntroduced. The longest path for a read 
request is nowshorter than it \\as before, but the read- conbi ni ng path i s si i ghtl y 1 onger . 
Beads vhi ch have been conbi ned do not recei ve the val ue of the data unti 1 after the confir- 
nati on step of the read request . Aread request that has been read conbi ned mist , i n the 
war st case, \\ai t through t wa traversal s of the hi erarchy (one up and down) , one he s sage 
bei ng sent across , a confirnati on bei ng sent up through the hi erarchy, and the data final 1 y 
being sent down to the conbi ned nodes . 

PHD al so di fferenti at es four read states i n the hi erarchy for a tree: invalid, reading, 
\\ai ti ng f or a read conbi nati on, and val i d data. The val i d data state nsans that the subtree 
recei vedt he data but nay have al ready deletedit. 

HP The EHPprot ocol [21] bothrequi res the fe\rest nunber of traversal s of the hi erarchy 
and i npl enent s asynchronous invalidation. Tbgether this conbi nati on resul t s inaprotocol 
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vulnerable to deadlock, because not enough infornationin the hierarchy is available to 
t el 1 \\hether or not a subtree i s wai ting to receive a block, has al ready receivedthe block, 
or has received and already deleted (but not yet propagated this infornation up\\ards) 
the bl ock. Thi s 1 ack of i nf ornati on i s used as the read conbi ni ng nechani sn^ i f a read 
request reaches a node vhi chis int he process of readi ng data, i t \\ai t s there unti 1 the data 
arrives. This nechani smby i t sel f is perf ectl y reasonabi e. Unfortunately, \\hen conbi ned 
TO th asynchronous invalidation, the nechani smresul t s in deadlock, \\here tw) nodes can 
each end up \\ai ti ng f or the sane bl ock, and each al so be proni si ng t o t el 1 the other vhen 
theyreceivetheblock. In thi s case ne it her request can ever be fil 1 ed. 

The read conbi ni ng nschani smof thi s protocol theoreti cal 1 y reduces the path 1 ength of 
the 1 ongest conbi ned read. The onl y probl ein\\hi ch can occur i s read chai ns , \\here a 1 i st 
of nodes is \\ai ti ng f or a bl ock. The val ues w 1 1 propagate one at a ti ns; upon recei vi ng the 
val ue i t has been \\ai ti ng f or , a node f or\\ar ds that val ue to its own list of \\ai ti ng nodes . 
Those nodes i n turn ni ght theirs el ves have lists of other nodes \\ai ti ng f or the sans val ue. 

The EHP can onl y di ffe rent i ate t wa states for subtrees i n the hi erar chy: i nval i d and 
valid, \\here validinplies that the subtree wll receive the data, has receivedthe data, or 
had recei ved (and al ready del eted) the data. It cannot use any other states because every 
node i s onl y vi si t ed once. 

NS Afourth protocol , not yet proposed, is identical to the EHP except that it elini- 
nates the asynchronous i nvali dati on abi 1 i ty. Wcall this protocol NM (Na Asynchronous 
I nval i dati on) . Bead request s still take onl y t wa traversal s of the hi erar chy. There are sti 1 
onl y tw) states for subtrees in the hierarchy but the neanings have changed: nowthe hi - 
erar chy keeps track of \\hether a subtree is i nval i d or has or to 1 1 get a particular block. 
Thi s protocol eli ni nates the deadl ock si tuati on of the EHP by guar ant eei ng that a "val i d" 
subtree ei ther has a copy of the bl ock or has an out standi ng read request vhi ch i s bei ng 
s at i s fie d out s i de of t hat s ubt r e e . 

The disadvantage of this protocol is that it requires that nodes reserve enough roomi n 
thei r caches for the bl ocks that they del et e bet \\een the ti ne that the del eti on i s i ni ti at ed 
and the ti ne that they recei ve an ackno\d edgnent i ndi cati ng that it is safe to perf or mt he 
del eti on. 
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L locked 




Figure 3.2: This figure shows howthe di stri but ed wri t e coimit warks. INbde lis in the 
process of requesting a write. The grey nodes have copies of the object. Consider what 
happens if INbde s 2-8 all requested read ope rati ons at thi s poi nt , and the 1 ock he s sages al 1 
froze i n the network, so that the reads wduI d have ti he t o conpl et e. INbdes 4 and 6 wduI d 
conpl ete their read request s , because they have 1 ocal 1 y cached copi es . The request s from 
INbdes 3, 7, and 8 wduI dstall on the write, because they wduI d reach a 1 ocked node before 
reaching a valid node. INbde 5, on the other hand, wduI d be able to conpl ete its read, 
because a val i d node above it is still unl ocked. INbde 2 wduI d be abl e t o conpl ete its read, 
bee aus e its re que s t i s c oni ng f r omt he wr i t i ng s ubt r e e . 



3. 1. 2 Dstri buted Wte Commt Ha nt 



The coimi t poi nt for awriteis distri buted i n PHD Awri t e wai t s unti 1 al 1 reads in progress 
fini sh. After a parti cul ar t i ihb, newi y st arti ng reads will be stalled unti 1 the conpl eti on of 
awrite; the calculationof this tiiiE is distri buted. As soon as alock ihb s sage reaches a 
node, no read ori gi nati ng f romany i nvali d subtree bel owi t wi 1 1 be sati sfied unti 1 after the 
wri t e. The one excepti on t o thi s rul e is that reads coning f romnodes in currently writing 
subtrees wi 1 1 be al 1 owed to conpl eteaswell. An exanpl e of the vari ous cases of reads and 
writes interactingto f ormt he coimit poi nt i s shown i n R gure 3.2. Thi s schens support s 
sequential consi st ency [ 14] , but also adds conpl exi ty to the protocol . 

The EHPprotocol, onthe other hand, does not nake any guar ant e e s about reads naki ng 
progress . Itispossible that a read in the EHP can be i ndefini t el y del ayed by a series of 
wri t e request s coning f romother nodes ; the read wi 1 1 spend al 1 of i t s ti ns sear chi ng the 
nachi ne f or a val i d copy. 
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3 . 2 De sign De c i si o n s 

Several desi gn deci si ons vhi ch\\ere nade i n the construct! on of the Protocol for H erarchi cal 
n rectori es coul d be easi 1 y vari ed. These deci si ons i nvol ve the read conbi ni ng nechani sn^ 
t he wr i t e i nval i dat i on he chani s n^ and t he i nval i dat i on ne chani s m 

3. 2. 1 Rad Gbiin ri i^ 

The desi gn of the read conbi ni ng nechani smi s another exanpl e of a ti ne- space tradeoff. 
The read conbining of PHD del ays read requests fromlater requesting nodes until the 
requests fromearlier request! ng nodes have been ans\re red. This del ay prevent s the later 
requests f romsendi ng another set of 1 ocati on and dat a transfer nessages. Another \\ayto 
i npl enent read conbi ni ng i s to use address- speci fie del ayi ng queues \\hi chare exanined 
\\he never a bl ock' s di rectory state on a node changes . Thi s strategy saves sons bi t s i n the 
state of each node, because aslot tost ore the \\ai ti ng status of every subtree for every bl ock 
i s no 1 onger needed. 

Ui ng these queues provi des twa net hods for deci di ng vhat to do vhen a 1 ock nessage 
reaches a di rectory node vhi ch has chi 1 dren \\ai ti ng f or the val ue. The first is to send the 
1 ock nessage t o al 1 \\ai ti ng chi 1 d subtrees , as i n the current ver si on of PHD thl i ke PHDl, 
ho\\ever, this schene requi res that alocknessage al \\ays check the del ayi ng queue before 
conti nui ng, i n or der to find out vhi ch subtrees need copi es of the val ue. The second schene 
is to not lock \\aiting subtrees; the reads coning fromthose subtrees are considered to 
happen after the wri t es . 

Another i nt eresti ng questi on t o consi der about read conbi ni ng i s \\hether it is \rorth- 
\\hi 1 e at all. Wthout read conbining the protocol becones substantially sinpler. It is 
not clear howof ten nodes request the sane value nearl y si mil t aneousl y, except for special 
synchroni zati on vari abl es ; these coul d be handl ed separate! y. 

3. 2. 2 Ifead Gbiii ri ng 'Mfenever Ibssi U e 

The read conbi ni ng nechani smof PHDconbines twa read request s \\henever they occur 
nearl y si mil t aneousl y. Wen a read reaches a node w th a subtree that i s al ready readi ng 
that block, and that node has no subtrees \\hi ch defini t el y have copies of the block, read 
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Fi gure 3.3: Thi s figure shows the twjpossibilities for read conbi ni ng. In both nethods , 
INbde 1 request a read ope rati on. Si nee it has no copy of i t , the request is sent up to the first 
node that has it, i n thi s case the root node. INbwlNbdes 2, 4 and 12 issue read request s t o 
the sane bl ock. INbde 2' s request \\ai t s at its parent , as expl ai ned i n Qiapt er 2. Si nil arl y, 
INbde 4' s request \\aits at its grandparent. The issue is vhat happens to the request from 
INbde 12. The request coul d be sent down the path to INbde 6, like INbde I's request was , 
or the request coul d be conbi ned. 



conbi nati on occurs . The quest i on i s what todointhe si tuati on, showni n H gure 3.3, where 
a read request propagates up to a node that has both a subtree in the ni ddl e of a read 
and a subtree wi th a defini t e copy of the bl ock. If a newread request is sent down to the 
subtree wi th a copy, one- fifth of the protocol tabl e ( shown i n Tkh\ e Gil) will be nol onger 
reachable and can be elininated, because the conbi ned vector state vX_wX_c X ( s ubt r e e s 
nay be invalid, valid, waiting, or confirnEd) can no longer occur. The other possibility 
for i npl enentati on i s that the read request be conbined, and thus forced to wait until 
the first read conpl et es , when it will be sent its val ue. Both versi ons of the protocol have 
been si mil at ed. The schens whi ch conbi nes read request s perf ornsd bet t er on the studi ed 
syntheti c traces and is currentiyi npl ensntedinthe system 



3.2.3 Wte Invalidate versus Wte Update 

Another interestingissue is whether t o i nval i date a bl ock or to update it wi th the new val ue 
when a renot e write occurs. PHD coul d be nodi fied t o use an update schens, i nst ead of 
an i nval i dat e one . For per for nance, a nschani smto peri odi cal 1 y renove unused copi es of 
blocks woul d be requi red. Wthout this capability, writes woul d i nvol ve al 1 nodes who had 
ever read the bl ock and whose caches had not subsequentl y chosen to di scard the block. 
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Wi t e update coul dbe extrenEly val uabl e i n sonE si tuati ons , such as \\iiere a few nodes 
v/ere const ant 1 y shari ng data. 

3. 2. 4 Noir 1 eaf Inval i dati on 

The protocol does not currenti y address the i ssue of a ful 1 cache i n a non- 1 eaf node. Wen 
non-leaf nodes fill up, vie cannot sinply discard the non-leaf values. Scott and Gbodnan 
have addressed this problemin [24]. Their solution is pruning caches, vhi ch could be 
adapted to wark i n PHD In thei r pruni ng cache schene, non- leaf directoryentries nay be 
di s carded vhen a cache is f ul 1 . Pruni ng caches store i nf ornati on about \\here a bl ock does 
not resi de rather than \\here it does reside. Intheir protocol, if alock he s sage ever reaches 
an i nval id entry located in a node \\hose parent has a val id entry, lock he s sages mist be 
broadcast to all children. Scott and Gbodnan have det ernined that "pruning caches wth 
anodest hit rate si gni fie ant 1 y reduce the i nval i dati on traffi:. " They al so found, in thei r 
si mil ati on studi es , that vhen a cache fil 1 ed up and they needed to di scar d an entry, "it is 
better to suffer i ncr eased i nvali dati on traffi: vhen the line is written than to prenaturel y 
i nval i dat e t he 1 i ne . " 

3.3 S u mma r y 

Thi s chapter exanined sons of the deci si ons nade i n the desi gn of the Protocol for H er- 
ar chi cal n rector i es . Sons of these deci si ons are easi 1 y change abl e i npl ensntati on i ssues . 
Others , such as the nunber of traversal s that shoul d be nade of the hi erar chy, expose naj or 
differences bet \reen PHD and other protocols. 

Al though the ninor deci si ons coul d be easi 1 y i sol at ed and tested to deternine vhi ch 
per for IIS better, the naj or ones cannot be tested in is ol ati on, as they are not necessarily 
se par abl e f romeach other. In or der to deternine the benefit s of these naj or deci si ons , \re 
mist conpare the per for nance of PHD and the other hi erar chi cal cache coherence prot ocol s 
for a vari et y of benchnarks . 



Cha p t e r 4 



Si mul at or 



W wot e a si mil at or to no del the ope rati on of the protocol runni ng on a conput er, such 
as the J-Mchine, to th a k- ary n- cube net wark t opol ogy. The similator currenti y nodel s 
nachines of 64 nodes wthtwjor three dinensions. The similator is trace- dri ven, taking 
as i nput a statically schedul ed 1 i st of he nor y references and si mil ati ng themby f ol 1 ovi ng 
the protocol . It outputs a 1 og fil e det ai 1 i ng the steps it took. W al so wrote a veri ficati on 
program\\hi ch takes the output of the si mil at or and veri fie s that it f ol 1 ows a 1 egal or deri ng 
of event s. 

4 . 1 Ov e r V i e w 

The similator serves tw) purposes: first, it tests the protocol and second, it provides a 
pi atformf or studyi ng several characteri sti cs of protocol behavior. In parti cul ar, itprovic 
a net hod to exanine the nunber of he s sages sent per ope rati on, the 1 ongest path travel ed 
per ope rati on, and the average hei ght inthe tree reachedper ope rati on for di fie rent types of 
ope rati ons . The resul t s of thi s study are i n Qiapt er 5. The si mil at or \\as not desi gned to 
support an analysis of howt he protocol behaves \\hen bur dened by net \rork constrai nt s and 
di fie rent cost s for di fie rent acti vi ti es . An anal ysis of these issues is locatedin Chapter 6. 

The si mil at or operates at the he s sage 1 evel ; one uni t of si mil atedtiiiE is the tiiiE a 
HE s sage takes to travel one hop bet\reen tw) adj acent nodes . The ti he a he s sage takes 
to travel fromnode Atonode Bis therefore equivalent to the distance i n hops bet\\een 
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nodes AandR Chelations \\iii ch can be sati sfied 1 ocal 1 y, such as a local read, write, or 
test- and- set , occur i nst ant aneousl y on a si ngl e node. Mssage processi ng f or a node, onthe 
other hand, takes a constant anount of tine, 10 hops, during \\hichthe node is busy and 
can process nonewevents. 

Tins and event sequencing is represented by an event- driven queue. The queue rep- 
resents a range of tinE. Each slot in the queue corresponds to one particular tinE, and 
cont ai ns a 1 i st of event s to occur at that parti cul ar ti he. There i s one gl obal queue for the 
s i mil at i on pi us one local queue per processor. 

Events are renDved f romt he queues and proces sed accor di ng t o thei r type: nsssages 
or operations. The siiiulator supports local allocation, read, write, and test- and- set oper- 
ations. It addi t i onal 1 y s uppor t s all of the types of nessages speci fied by the protocol . The 
gl obal event queue al so support s pri nti ng event s , cache- enptyi ng event s , \\arin start events , 
and HE nor y- duiipi ng event s . 

In addi tion to listing every ope rati on as it conpl etes, the si mil at or can be configured 
to pri nt any of the f ol 1 ow ng 1 og i nf ornati on: event s processed (as they occur i n the event 
queue), mssages processed, andiiEssages sent. The similator can be configured to pri nt 
out nany di fferent types of statistics about the protocol audits operation. 

4 . 2 Da ta Layout 

The "global" nEnory of the systemis scat tered throughout the nodes. Each node has a 
sectionof its he nor y devoted to st ori ng the data bl ocks that i t has copi es of , or kno\\B 
about. It also has a secti on \\hi ch cont ai ns nappi ngs froman {address , level} pair to a 
poi nt er to the data block stored in the data secti on. 

There are t wa types of entri es vhi ch ni ght be poi nt ed t o f romt he data- nappi ng t abl e. 
The first type is aleaf entry. It represent s an actual bl ock of data, and corresponds to a 
bl ock of HE nor y vhi ch waul d be found i n a node 1 ocat ed at the bot t omof the hi erar chy. 
The second type is a parent entry. Aparent entry stores i nf ornati on about \\hi ch subtrees 
have copi es of parti cul ar bl ocks . These entri es correspond to he nor y vhi ch waul dbe found 
on a node of the hi erar chy not 1 ocat ed at the 1 eaf 1 evel . 

Al eaf cache entry, shown i n T^bl e 4. 1, takes up iVf 2 -words , -where Ms the line size. 
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State 


Word 




Address Level 


Data (N words) 



T^bl e 4. 1: The part s of a 1 eaf cache entry. 
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State 


VWC Vector 


Writer 




Address Level 



Tkbl e 4.2: The parts of a parent cache entry. 



The first ward cont ai ns the state of the entry, as descri bed i n Qiapt er 2. It al so cont ai ns 
the i nf orimti on, duri ng wri t es , of vhi ch wardi n the cache 1 i ne is bei ng wri 1 1 en. The second 
ward encodes the global address of the object and the level. The final iVwards are the 
val ue of the bl ock start i ng at the address . Because onl y the address representi ng the start 
of the bl ock i s inportant, this i npl enent ati on hi des the level in those redundant bottom 
address bi t s . 

A parent entry i s any entry vhi ch does not correspond to a 1 eaf of the hi erarchy. It i s 
al \\ays conposed of exact 1 y t w) wards . The first ward cont ai ns the bi t s s pec i tied by the 
protocol , as \rel 1 as a vector of up t o si xt een bi t s i ndi cati ng vhi ch subtrees have copi es 
of that object. The vector contains tw)bits per subtree. The writer field, \\iii ch i s used 
onl y duri ng wri t es by nodes 1 ocat ed on the path bet\reenthe write requester and the 1 o\\est 
corniDn ancestor of al 1 copi es, stores the i ndex of vhat subtree isperforningthewrite, so 
that 1 ost read request s can be routed to the writer, as descri be din Chapter 2. A 32- bi t 
entry can actual ly store up to 12 subtrees , al though onl y the capabi lityfor ei ght is i n the 
current siiiulator version. The second ward encodes the global address of the object and 
i t s 1 evel , exactl y as i n the 1 eaf entry. 

The si mil at or nanages the si mil at ed heap by usi ng a he nor y al 1 ocati onnanager. The 
nanager prevent s nEnory f rombei ng f ragnent ed by rearrangi ng nEnory \\henever a bl ock 
is freed, and throw ng a\\ay purgeabl e bl ocks vhen necessary. The si mil at or al so provi des 
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a HE t hod for renovi ng bl ocks f romthe cache on denand f romthe i nput , in order to test 
the protocol . 

4.3 The Simulator 

The nain parts of the si mil at or are the node nodel , the net wark nodel , and the event - 
dri ven queues . 

4. 3. 1 Mde Mdel 

Chi y the state of a node essenti al to the operati on of the si nul at or is nodel ed. Every node 
has a node id, a local event queue, a set of del ayi ng queues , and he nor y. Associ at ed w th 
the HE nor y i s a t abl e vhi ch stores nappi ngs bet \reen {address , 1 evel }pai rs and poi nt ers 
i nt o the he nor y. In an i npl ensnt ati on on an actual nachi ne, the t abl e waul d be part of 
the HEnory. 

Because the processi ng of a he s sage inanode occurs inasi ngl e tins-step, part of the 
state of a node stores \\hether or not a node is busy and, if so, for howlong. A special 
event, node_done, is added to the local event queue at the tins vhen a node should finish 
processi ng the current he s sage. Anode to 1 1 perf ormno acti ons i n the he ant i mt. 

Each node al so stores a speci al associ ati on 1 i st recordi ng the val ue to be writtenfor 
any ongoi ng wr i t e . Thi s i nf or nat i on w)ul d nor nal lybe stored directly in t he instruction 
stream 

Ideally, each node can tinEshare anong several different processes. Wen a process 
nakes a global nEnory reference \\hi ch is not locally satisfiable, it suspends \\hi 1 e the 
reference is filled. In the he ant i rae, other processes can run. 

The similator nodel s this ability by al 1 ow ng nglobal requests per processor to be 
occurri ng si mil t aneousl y, \\herems an executi on- ti he paranEt er. Chelations \\hi ch have 
al ready been read i n f romthe i nput trace are pi aced i n a reference queue vhi ch i s part of 
the node nodel , but is disjoint f romthe event queues. Wenever an operati on on a node 
conpletes, the queue for that node is checked. If the next operati on on that queue is due to 
happen inthe future, it is schedul ed. If the next operati on \\as supposed to happen al ready, 
it is started. If there is no \\ai ti ng operati on, the parser i s i nvoked to read nore i nput . 
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4. 3. 2 ritvork Mdel 

The netwark of the J- Michi ne is a three- di nEnsi onal nEsh. The siiiulator nodel s this 
network, or opti onal 1 y a t w)- di nensi onal nesh, i n a conpl et el y unl oaded condi ti on, i.e. 
under a zero conges ti on si tuati on. Ms sage del i very takes ti he proporti onal to the di stance 
bet \reen the nodes al ong a Minhat t an r out e : first the Xdi recti on i s fol 1 o\\ed, then the Y, 
then the Z. AnEssage is "sent" at the end of the period of tins corresponding to any 
processi ng that a node i s doing. 

W have chosen paranEt ers such that each hop i n the network takes one- tenth of the 
tins to process a he s sage. Thi s nodel s a system™ th bal ance bet \\een conput ati on and 
c onnuni c at i on f or fine- grai ned processi ng. The longest nEssage sent is 4+iVvrords (Ms 
the line size); the shortest 2 wards . Al 1 he s sages are approxi nated as bei ng the sanE 1 ength 
for purposes of arri val tins. Ifthedesti nati on of a he s sage i s the node that generated the 
HESsage, the transni ssi on i s suppressed and the c onput at i on c ont i nue s innsdiately. 

4. 3. 3 Brent- I>i ven Queues 

Ti HE i s i npl ensnted as a ci rcul ar 1 i st of queues . At the start of a si nul ati on, the si nul at or 
reads a bl ock of the input and schedules the speci fied event s . It places each event in the 
queue entry representi ng the appropriate tins, creati ng entri es as needed. It then begins 
processing the queue. The sinulator processes input on a node by node basis vhen any 
node runs out of ope rati ons t o perf orn^ as nore fully described in Section 4. 3.1. Wen 
there are no nore event sin any of the queues , the si nul ator hal t s . 

Global Queue The global event- driven queue keeps track of the events that are to be 
acti vat ed duri ng each tins slice. There are several different types of events. Chelations 
are speci fied by the i nput fil e, and i ncl ude REA D, WR IT E , T A s , and A LL oc. As nsnti oned 
earlier, there are al so vari ous types of debuggi ng event s , not ne cess arily speci fie to parti cul 
nodes, \\hi ch can be specified. 

local Qieue Local event- driven queues are used, one per node, to keep track of node 
speci fie event s , such as he s sages , and gl obal events that be cons 1 ocal . Ms sages are gener- 
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at ed i n response to other he s sages or ope rati ons . These queues correspond n»st closelyto 
the HESS age queues that w)ul d be f ound on sohe nachi nes . 

rblaying Qeue Each node addi t i onal 1 y has del ayi ng queues . Events \\hi ch cannot be 

sati sfied unti 1 another event occurs are pi aced on an address- speci fie del ayi ng queue, and 

waken up onl y vhen an event ref erri ng t o that address occurs . 

4 . 4 Ve ri ficati on 

The si nul at or \\as tested by runni ng a hand- writtenset of tests, desi gned toexerciseall of 
the features of the protocol, as \rell as nany sets of nachi ne- generated syntheti c address 
streans. The si nul at or al so cont ai ns sel f - consi st ency code, ensuring that an error is sig- 
nalledif state andnsssage conbi nations \\hi chare illegal occur. Averifier \\as writtenin 
order to hel p veri f y the si nul at or. 

4. 4. 1 Veri fier 

W wrote a veri ficati on prograni\\hi ch takes the output of the si nul at or and ensures that 
the output sequence of reads , wri t es , and test- and- sets is al egal orderi ng of the requested 
events. The veri ficati on of the verifier was done for a large set of hand- craft ed test cases. 
The rules that the verifier obeys are as follo\\B: 

• Any read that fini shes before awite ope rati on star ts wll see the old val ue. 

• After a write operation fini shes , the val ue changes , and any read that st art s get s the 
ne w val ue . 

• Any read that starts before a write operati on fini shes and finishes after the write 
operati on st art s nay see the ol d or new val ue. 

• At est- and- set nay onl y conpl ete successfullyif the val ue of the data is zero at the 
point \\hen the set waul d occur. 
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4.4.2 InternEil Giecldiig 

The si mil at or is peppered wthasserti ons vhi ch check for ill egal states and coiibi nati ons . 
The si mil at or also has an opti on al 1 oto ng one to choose naxi immbounds on a random 
interval for ejecting values fromthe cache. Wen this interval is set to one, values are 
ej ect ed fromthe cache one ti he uni t after they are pi ace d there, al 1 oto ng f or a thorough 
t esti ng of the protocol . 



4.5 S u mma r y 

This chapter describes the si mil at or used to experinent wth PHD The si mil at or im 
plensnts the full protocol plus cert ai n ext ens i ons , such as local al 1 ocati on and opti onal 
aut onati c al 1 ocati on on uni ni ti al i zed dat a. The similatoris trace- dri ven, and can gather 
nany types of statistics for studyi ng the protocol . 

The similator has been used to test the protocol; additional features for debugging 
i ncl ude pri nti ng events and cache- enptyi ng event s . Aspeci al veri ficati on program\\as al so 
desi gned to ensure that the protocol keeps the he nor y consi stent . 

The similator runs 3,999,800 cycles in just under twa hours. This represents 4096 
al 1 ocati on request s, 384, 417 read request s , and 255, 583 write request s, all resultingin 
total of 1, 496, 929 he s sages bei ng sent . Each node was al 1 ocat ed 0x3000 wards of nenory 
for this similation. This ti he iiEasurenEnt took place on an uni oaded Sparc II TOth32 
HEgabytes of ERAIVf accessi ng onl ylocallystoredfiles, and \\as t ypi cal of howthe si mil at or 
■was actually run. 
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Abstract Analysis 



This chapter presents an abstract nodel of the Protocol for Herarchical Elrectories and 
then uses the no del t o showt he effect s of 1 ocal i ty and nachi ne size on several charact eri sties 
of the protocol . The no del is shown to be val i d usi ng resul t s generated by the si mil at or 
descri bed in Qiapt er 4. The nodel is used to study the average hei ght per operati on, the 
1 ongest path of he s sages travel ed per operati on, and the nunber of he s sages generated per 
operationfor nachi ne configurations too large to similate. The results fromthis chapter 
becoHE the inputs of an enbedded nodel , descri bed i n Qiapter 6, \\hi ch addresses the 
protocol behavi or vhen i t i s napped onto a speci fie archi t ecture. 

Sect i on 5. 1 descri be s the no del and a newnethod of representi ng the anount of 1 ocal i ty 
i n an appl i cati on. Secti ons 5. 2 through 5.5 di scuss the appl i cati ons and net hods used to 
val i date the nodel . Secti on 5. 6 present s the resul t s of the study, show ng the i nport ance of 
1 ocal i ty as nachi nes i ncrease i n si ze. An al phabeti cal 1 i sti ng of al 1 of the vari abl es define 
inthis t he sis canbe f ound i n T^bl e A 1 . 

5 . 1 Mb del i n g Hi erarchical Behavior 

Before neasuringthe appl i cati on- dependent behavior of the protocol, the protocol charac- 
teristics to be ne as ured, the appl i cati on charact eri sties needed to ne as ure these aspect s of 
the protocol, and a nodel of the protocol behavior mist be defined. 
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5. 1. 1 Gtervi ew 

Since \re are pri nari 1 y i nterested i n understand! ng how a hi erarchi cal protocol scales as 
nachine size and 1 ocal i ty change, vie have studied three appl i cati on charact eri sti cs : the 
average hei ght inthe tree are ad or write ope rati on reaches , the 1 ength i n he s sage hops of 
the "1 ongest " path traversed in order tosatisfyareador write ope rati on, and the nuiiber 
of nEssages generated per read or write operation. 

Because thi s chapter does not study the protocol as napped onto a parti cul ar ar chi t ec- 
ture, issues such as \\hether or not the cal cul at ed raessage- generati on rat e can be sustained 
due to band™ dth consi derati ons are not consi dered. Si nil arl y, \re are al so assuning i nfini t e 
caches , si nee fini t e cache effect s conpl i cat e the no del , cl oudi ng the i nport ant mil ti proces- 
sor i ssues under consi derati on. R ni t e cache nodel i ng can al \\ays be f act ored i n 1 at er [ 1] . 

5. 1. 2 locsl i ty Qiaracteri sti cs 

In order to study the behavior of a cache coherence protocol \\hi ch is highly dependent 
on 1 ocal i ty, \re mist have sons he t hod of expressi ng the 1 ocal i ty present in appl i cati ons . 
W propose a representation of locality tailored to studyi ng hi erar chi cal cache coherence 
protocols 

Shared data ope rati ons are al \\ays caused by node request s . Inst e ad of choosi ng a node 
to nake a request and f ol 1 ova ng that request up the hi erar chy, as i n the actual protocol , 
the abstract nodel chooses \\hi ch class of nodes a request occurs in. The actual node 
that nakes the request i s uni nport ant , al 1 that natters i s -what cl ass that node is i n -ni th 
respect to vhat other nodes have copi es of the block. All nodes inaclass have equal - hei ght 
1 o\\est val f Ancestors . Choose the node to nake a request as f ol 1 ows ( see R gure 5. 1 for an 
i 1 1 us t rati on) : start at the root node of a di rectory tree, and choose f romone of t wa groups : 
the i nval i d and t he val i d subtrees of the root. If the invalid class is chosen, the process 
stops . If the val i d cl ass is chosen \\e agai n choose f romt wa groups : the i nvali d and the 
val i d subtrees of the val i d chi 1 dren of the root . Thi s process conti nues unti 1 an i nval i d cl as 



^Ki rk Johnson greatly assistedinthe devel opiient of thi s 1 ocali ty niodel . 

^Because all requests are rtiodel ed as occurri ng i nst ant aneousl y, we consider onl y t W3 st ates: val i d and 
i nval i d. Val i d i npl i es t hat t her e is a copy i n t he s ubt r ee ; i nval i d i npl i es t hat t her e is not . 
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Fi gure 5.1: Thi s di agrami 11 ustrat es the sel ecti on of node cl asses perf orned by the nodel . 
The grey nodes are valid. The sel ecti on process starts at the root node, \\here either the 
group of subtrees vho are val i d or are i nval i d are chosen. In the 1 eft si de of the figure, the 
val i d group is chosen. Because the val i d cl ass \\as chosen, at the next 1 evel another selection 
mist be nade. At this selecti on, the i nval i d group is chosen. Thi s nsans that the node to 
nake the next request w 1 1 be i n the cl ass of nodes vho are not val i d, but \\hose parent s are 
val id. In the ri ght si de of the figure, the selecti on process agai n begi ns at the root , \\here 
the i nval i d group is chosen. Thi s ends the sel ecti on process ; the next request w 1 1 be nade 
by a node vho i s i nval i d and vhos e par ent i s i nval i d but vhos e par ent ' s par ent i s val i d. 



is chosen, or until the leaf is reached. If an i nvali d cl ass is chosen, all nodes belowtha 
cl ass are i n the group of nodes that to 1 1 nake the next request . If a val i d path down to the 
1 eaves i s chosen, al 1 1 eaf nodes vhi ch are val i d are i n the group of nodes that to 1 1 nake the 
next request . 

W cal cul at e the probabi 1 i t y of choosi ng the val i d cl ass as f ol lfl\*hedle(fi(na!lj)ty 
paranet er of 1 evel /, as the a pri ori probabi 1 i ty that the choi ce to 1 1 be the val i d group vhen 
1 ooki ng doTOi f romi evel /. These 1 ocal i ty paranet ers can be different at each level of the 
tree. If the request did not a priori cone froman already valid subtree, vie distribute 
the probabi 1 i ty of \\here it cane fromuni for ni y over all of the children. The locality in 
an appl i cati on i s thus expressed by thi s set of 1 ocal i ty paranet ers . For exanple, i n an 
appl i cati on \\here blocks v/ere accessed uni f orni y by al 1 prgdiecBrsatl ,/^w3ul d be 0. 

This set of 1 ocal i ty paranet ers lets us describe an appl i cati on' s datausage. For greater 
accuracy, i nst ead of consi deri ng an average bl ock, vie coul d consi der several cl asses of bl ocks 
TO th thei r otoi sets qi p 

5. 1. 3 Mdel 



The per for nance nodel cal cul at es the average hei ght inthetree, thel ongest path traversed, 
and the nunber of nessages sent per read and per TO"ite operati on,;,utha^ peal i ty 
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]b-3 



r(l-p,)j^ 



'('-P,)i 



Fi gure 5. 2: The Mrkov nodel for counti ng the nunber of val i d chi 1 dren of a val i d parent , 



'(Pi^^'-Pi^n^^ 



paraneters, w, the wri te rati o, 6, the branch! ng fact or, and i, the nunber of 1 evel s i n the 
tree. 

Define r t o be the f racti on of reads to shared data and wi:obe the fractionof writes to 
shared data, ■where r-\-w=l. 

W fir st det ernine|t; the probabi 1 i ty that c chi 1 dren of a val i d node at 1 evel /are val i d, 
by constructi ng a Mrkov model , as shown i n Fi gure 5.2. W use the sol uti on t o thi s model 
to calculate the expect ed val u^ dfhe: nunber of val i d chi 1 dren at /;] M^^-iCvf. 
tfei ng <?, \re cal cul at e the val UQ,of he probabi 1 i t y of t aki ng a val i dpath vhi 1 e perf orning 
a node sel ecti on, 1 ooki ng down f roml evel ^j =^'^3^ thi s means that the root node to 1 1 
be chosen to th probabi 1 i ty one and si npl i fies the equati ons . 



ti =pz +(1 -pi) 



ci 



(5.1) 



I§ight Cal cul ati ng the expected hei ght areadwll reachgiventhenodel is strai ghtf or- 
\\ar d. A read by a node i n the cl ass of nodes vhi ch are val i d to 1 1 be of hei ght . A read 
by a node i n the cl ass of nodes \\hose parent s are val i d but vho i s not i t sel f val i d to 1 1 be of 
height 1. The expect ed hei ght of a read reque^]t ,i ^^ ven i n Equati on 5. 2. 



L-i L 



(5.2) 
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Figure 5.3: In both of these exairples, the grey nodes have copies of the value, and the 
bl ack node is at t enpti ng t o per for ma wri t e. The wri t e ope rati on to 1 1 have to reach the 
t op 1 eve 1 of the tree i n order to conpl et e. 



In order t o cal cul at e the expected hei ght a wri t e request to 1 1 reach, vie mist consi der 
not onl y \\hether or not the requesti ng node has a copy, but al so \\hether or not other nodes 
of any cl asses have copi es . ATO"i t e mist progress up\\ards i n unti 1 such a hei ght as vhi ch 
onl y a si ngl e node at each 1 evel is val i d, and those nodes are al 1 ancestors of the TO"i ti ng 
node. Forexanple, as shoTOi i n Fi gure 5. 3, \\here the bl ack node i s requesti ng the TO"i te 
ope rati on and grey nodes have copi es of the val ue, a TO"i t e ope rati on toduI dneed to reach 
the top of the tree in both cases . In the first case, the f ul 1 hei ght of the tree is needed j us t t 
reach any other copi es . In the second case, al though a shorter hei ght i s suffti ent tolocate 
a node to th a copy of the bl ock, the TO"i t e ope rati on mist reach the top of the tree i n order 
t o i nvali date the other copi es . !feepi ng these rul es i n nind, vie cal cul ate the expected TO"i t e 
hei ght , ^fy] . 



^h. 



L-l 



=5]/(i^ ith) n w 



/l=l 



(5.3) 



! =?i+l 



longest Rith The 1 ongest path traversed duri ng a read request i s the path of the request 

up the tree, doTOi to the node that has i t , and then di recti y to the requesti ng node, as shoTOi 

inHgure 5.4. The expect ed 1 ongest 1 eng}:K ^■^hus : 



L-i L 

h^ I =?i+l 



(5.4) 



The 1 ongest path for aTO"ite is up tot he hi ghest node, doTOi t o al 1 of the copi es , back 
up to the top, and then doTOi fromthe top to the requesting nodig]. i^'fei^^n in 
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Fi gure 5.4: The 1 eft exanpl e shows a read request , the ri ght a wri t e request . In both of 
these exanpl es, the grey nodes have copies of the value, and the bl ack node is nakingthe 
request . The 1 ongest path traversed i s shown for both cases . Nate that for the wri t e case, 
there are other e qual 1 y 1 ong pat hs not shown. 



Equati on 5. 5. 



L-i L 



(5.5) 



Nniier of Mssages The cal cul ati on of ^m r] ? the expect ed nunber of nes sages per 
read ope rati on, isverysinilarto that for the expected 1 ongest re ad path. The onl y di fference 
i s that the set of he s sages sent f romthe readori gi nat or to the top node of the read to confirm 
that the read has occurred mist be added in. 



L-i L 

/i=i I =fc+i 



(5.6) 



The expected nunber of mssages per wri t e ope rati on, on the other hand, re qui res nore 
knowl edge than the 1 ongest wri t e path cal cul ati on. Thi sis because the nunber of nss sages 
depends on how nany nodes have read the block since the last write and therefore need 
to be invalidated. EenEnber th^tisthe expect ed nunber of val i d chi 1 dren of a val i d 
node at a level /. For each non- 1 ocal write, one set of nsssages is sent f romthe requester 
to the hi ghest node i n the tree i nvol ved i n the wri t e, as shown i n R gure 5.5, a f ul 1 f an- 
in and fan- out of ac knowl edgnent s and i nval i dat e s is sent to all nodes with copies, a final 
acknowl edgnent i s sent down to the wri t er, and wri t e owners hi pis transferred to the wri t er. 
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Fi gure 5.5: Thi s figure i 1 1 ustrat es the nunber of he s sages sent for a t ypi cal wri t e ope rati on. 
The grey nodes have copi es of the val ue, and the bl ack node is perf orning the wri t e. 



The expected nunber of nessages per \vrite,„^?iis thus : 



L-l 



h /l-l 



^m.. 



:$: 2/.+1+2 y: n ^e wi^j.^.) n w 



(5.7) 



h^ 



=i e=i -1 



=^-\i 



5 . 2 Ap plications for Mb del Ve rification 

Three appl i cati ons have been enpl oyed i n the val i dati on of the nodel . Che i s a uni form 
reference pattern, i n vhi ch every processor is equal lylikely to reference all of data. Thi 
second ninics a basic rel axati on pat t ern, such as a Jacobi relaxation. The third is a 
synthetic pattern exhibiting clustering behavior: nodes further a\\ay froma fixed "hons 
1 ocati on" of data access it less f requenti y than do cl oser nodes . 

5.2.1 liiform 

The uni f ormref erence pattern fits the nodel exactly, thi f orni t y i npl i es that every node 

i s equal lylikelyto reference any bl ock. Because of thi s property, thereisnolocality, sot 

entire set of locality par ara^tsdrcsujid al \\ays be zero. 



5. 2. 2 Ifel £cxati on 

In the particular relaxation vie sinulated, during every iteration every point of an n 
di nensi onal nesh updates its val ue by a f uncti on of the value of i t s 2nnei ghbors . 

Cbnsider a 2- di nensi onal relaxation inpleraEnted on a 2- d grid of processors. The 
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Fi gure 5.6: A32by32gridofrel axati on data i s napped onto a 4 by 4 gri d of processors . 

obvious \\ay t o enbed the probi emi s t o nap a conti guous 2- d portion of the rel axati on 
array onto a si ngl e processor, such that the nearest nei ghbors of al 1 of the poi nt s i n a si ngl € 
processor are either on that processor or a neighbor of that processor. The enbedding, 
shown i n Fi gure 5.6, is reasonabi e for a hi e rare hy as \rel 1 . For the no del , vie as sum that 
the rel axati on gri d i s napped i n the above fas hi on. 

Nate that an exact cal cul ati on of the read and wri t e hei ght can be perf orned f or thi s 
appl i cati on. Itefin^ dhd Wi as the nunber of read and wri t e ope rati ons vhi chreachlevel 
/, respectively. These formil as are shown and deri vedi n Equati ons R 1- B. 3 i n J^pendi x B. 
The read and wri t e hei ght s for the appl i cati on can then be exact 1 y expressed as : 



'^w — 



Eii Ri 
Ef i' Wi 



(5.8) 
(5.9) 



5.2.3 auster 



The cluster al gori thmassunss that there are clusters or groups of processors \rorkingon 
data. Qusters are saidto ow^ bl ocks . The processors wthinagivencluster are nore 1 i kel y 
to reference blocks owned by the cl ust er than bl ocks owned by other cl ust ers . Thi s no del 
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Figure 5. 7: In t hi s figure, the bl ack node owns a bl ock. The 1 i ghter the col or of a node, the 
less likelyit is to access the block. 



is sinilar to the one proposed by Q ng ^ng, in [31] . 

Itefine e as the f racti on of al 1 ope rati ons by node P vhi choccur to its own bl ocks . ei s 
the defini ng paranet er of a cl ust er appl i cati on. As shown i n Fi gure 5.7, Paccesses bl ocks 
owned by processors in the group of 6processors containing but not including Pvith 
uni f ormprobabi 1 i t y, S\hi ch i s cal cul at ed f rome Paccesses bl ocks owned by processors 
i n the group of^ ^processors cont ai ni ng but not i ncl udi ng the af orenenti oned 6 processors 
TO th the snal 1 er uni f ormprobabi 1 j.t ^DE s access probabi 1 i ty i s cal cul at ed as f ol 1 ows : 



El 



1=0 



(5.10) 



This f ormil a i npl i es that the frequency of requests to processors in the next largest 
cluster but not in the current one decreases by a factor of t w) as the clusters increase. 



5.3 Si mu 1 ati on of Ap pi i cati ons 

The syntheti c address traces of three appl i cati ons v/ere si mil at ed usi ng the si mil at or de- 

scribedin Qiapt er 4 i n order to deter nine val Ui^stftxars^t of 1 ocal i ty paranEt ers , to th 

vhi chto check the model. All appl i cati ons \rere si mil at ed both for a 2- di raensi onal , radi x 

4, four level tree (i=4, 6 = 4) and for a 3- di raensi onal , radi x8, three level tree (i=3, 

6 = 8). In both cases this resulted in a 64 processor si mil ati on. As mich he nor y \\as 

al 1 ocat ed t o the processors as \\as necessary to run to thout i ncurri ng cache overflowni sses , 

in order to si mil at e i nfini te cache size. 
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200 
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400 



Relaxation 
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Cluster 
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64 
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Tkh\ e 5. 1: These are the val ues of the paranet ers usedinthe si nul ati ons . 

Uniforin The uni f ormaddress trace consists of references, by every processor, every T 
si mil at or steps , t o a randoniy chosen one of raddr esses . M 1 addresses are equal 1 y 1 i kel y t o 
be chosen by each processor. The paranet er vari edinthis trace is the percentage of wri t es , 
w 

Ifelaxation The rel axati on address trace consi st s of cycl es of reads t o nei ghbors f ol 1 o\\ed 
bywrites. In the trace every node si mil t aneousl y nakes the read request s f ol 1 o\\ed by the 
wri t e request s for the first bl ock, then the read request s f ol 1 o\\ed by a wri t e request for 
the second block, etc. In other wjrds, the grid is bei ng updated such that soihb blocks 
are updated by val ues f romi ater iterati ons on earlier iterati ons , sinilartoa Gkuss- Sei del 
relaxation. There are Tsimilator cycles bet \reen each reference. The anount of the grid 
as si gned to each node \\as vari ed across the si mil ati ons . 7^ the nunber of i t erati ons , i s 1 ow 
because the similator perf ornEd a \\armst art for this application. For the 2- di raensi onal 
(6 = 4) relaxation, the percentage of writes \\as 20; for the 3- di raensi onal (6 = 8), the 
percentage \\as 14. 



Quster The cluster address trace consists of references, by every processor, every T 
similator steps, to a randoidy chosen one of iV" addresses , \\here iVi s the nunber of 
processors. Accesses by a parti cular processor toself- owned addresses occur w thprobabi 1 i ty 
e The probabi lityof references to other clusters is calculatedaccordingtothe for mil a 
describedearlier in Equati on 5. 10. The paranet er s vari edinthis trace are the percentage 
of wr i t e s , w;and t he bas e pr obabi 1 i t y e 
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Fi gure 5.8: The 1 ocal i ty paraneljeEsiipasuredf romthe uni f ormappi i cat i on si mil ati on. 

5. 4 Locality Para me t e r s Me asured fromSi mu ] 

The si mil ati on al 1 o\\ed us to nEasu^et^e set of locality paranEt ers for the appl i cat i ons . 

IH form As predicted, the values fopa^e nearl y zero for the uni f ormappi i cat i on, as 
showni n Fi gure 5.8. Ahi ghpoi nt occurs vhen the percentage ofwritesiszero; this behavi or 
i s caused by the fact that there are no writes at all. If the si mil ati on \\as run for a very 
1 ong ti HE, eventual 1 y nearl y every node w)ul d have a copy, and then there w)ul d seemt o 
be c or r el ati on i n the choi ce of a subtree; a val i d subtree node w)ul d be nore 1 i kel y to be 
selected than an i nval i d one because there are so nany. 

Ifelaxation As the anount of data per node i ncr eases , the appl i cati on exhi bi t s nore and 

nore 1 ocal i ty at every 1 evel , as expected. Nate that there is trenEndous 1 ocal i ty f or the 

references vhi ch reach the hi gher levels, as there are extrenElyfewof them 



Quster For al 1 val ues of e, the f racti on of references by an owner t o i t s own bl ocks , and 
/, the nunber of levels, as the write f racti on i ncreases thf daiiiffiaefisp This is 
because awriteissued f roma nore renot e node ca^iesdp crease twee: once vhen the 
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Fi gure 5. 9: The 1 ocal i ty paranEl/eaissijEasured f romthe rel axati on appl i cati on si mil a- 
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Fi gure 5.10: The 1 ocal i t y par anEtaBsi^B as uredf romthe cl ust er appl i cati on si mil ati on. 
The base reference f racti on e = 0.75. 
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Fi gure 5.11: The 1 ocal i t y paranet aBsi^Basuredf romthe cl ust er appl i cat i on si mil ati on. 
The base reference f racti on e = 0.5. 
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Fi gure 5.12: The 1 ocal i t y paranet aBsi^Basuredf romthe cl ust er appl i cat i on si mil ati on. 
The base reference f racti on e = 0.25. 
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write is issued, to pul 1 the val ue i nt o the cache, and once vhen the nore connon read or 
wri t e occurs to the owner node. 

Nate that as e i ncreasesi, sj^ays artificially high. This behavior is caused by the 
extrenel y hi gh val ue of e Wen eis 0.75, three- quarters of al 1 request s t o bl ocks owned 
by node Pare nade by node i? This nsans that the effect nsntioned above, \\h?re p 
decreases wthwritefracti on, does not oftenoccur (sinceawritetothe owner node f ol 1 o\\ed 
by another operati on to the owner node (to th no i nt erveni ng wri t es by other nodes ) ) does 
not 1 o\\er the 1 ocal i ty pararaet er . 

5.5 Gbnparisonof Mdel and Si mil at i on 

The predi ct ed and si mil at ed average read and wri t e hei ght s are very si nil ar, and confirm 
that the set of 1 ocal i ty paranst er s i s a val i d\\ay of expressi ng the behavi or of an appl i cati on. 
The predi ct ed nunber of he s sages per read and wri t e operati on al so conpares favor abl y t o 
t he s i mil at i on. 

Un i f o r m W exanined the average read and wri t e hei ght of the uni f ormappl i cati on as 
the write fractionis vari ed, as shown i n R gure 5.13. Nate that as expected, usi ng the val ue 
of the set of 1 ocal i ty paranst er s nsasured f romthe si mil ati on produced i denti cal resul t s 
as just using the value zero, showng that the deviations f romzef onirrFdiiea^K 
i ns i gni fie ant . 

Fi gure 5. 14 conpares the predi cted nunber of he s sages per operati onw th the si mil at ed 
nunber. Al though the predi cted nunber of wri t e he s sages i s hi gher than the si mil ated f or 
1 owval ues of the wri t e f racti on, the predi cted and si mil ated nunber nearl y nat ch f or the 
rest of the write fraction range, and the shape of the curve i s very si nil ar. 

Re 1 axat i on For the rel axati on appl i cati on, \re exanined the average hei ght charact eri s 
as a f uncti on of the anount of data al 1 ocat ed t o each node. The resul ti ng 3- di nEnsi onal 
(radix 8) and 2- di nEnsi onal (radix 4) graphs can be seen in R gure 5.15. Nate that the 
nunbers shown on the x- axi s of the graphs represent the anount of data allocated per 
di nEnsi on per node. In other war ds , to cal cul at e the actual data per node, cube the nunber 
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Fi gure 5.13: Thi s figure shows the average read and wri t e hei ght no del predi cti ons as \\el 1 
as the si mil ated ones for the uni f ormappi i cati on. Nate that t w) sets of ;v\5dnKS f or p 
used: one \\here;pi\as set to zero, and one \\heijeH?i>s neasured f romthe similation. 
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Figure 5.14: This figure conpares the predi cted nunber of nessages per read and write 
operati on TO th the si mil ated val ues for the uni f ormappi i/csrateineiasjired fromt he 
si mil at i on for the predi cted curve. 
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Fi gure 5.15: Thi s figure shows the average read and wri t e hei ght no del predi cti ons as \rel 1 as 
the si mil ated ones for the rel axati on appl i cati on. Two di fFerent predi cti ons are shown: one 
uses the 1 ocal i ty paranet er nodel , and one uses an analytical cal cul ati on of the rel axati on 
data noti on to di recti y cal cul at e the hei ght s . 



shown i n the figure for the 3- di nensi onal case, and square the nunber for the 2- di nensi onal 
case. The di recti y cal cul at ed average hei ght s nenti oned in Section 5. 2. 2 are also incl uded 
on the graphs. The exact cal cul ati on does not war k proper 1 y \\hen there i s onl y one data 
point allocatedper processor. 

Cluster R gure 5.16 shows graphs of the wri t e f r act i on versus the average read and wri t e 
characteristics for three values of the base reference paraneter: 0.75, 0.5, and 0. 25. The 
nodel is nore accurate for higher base reference values. 

Figure 5. 17 conpares the nunber of nessages per read and write operati on predi cted 
fromthe nodel wththe nunber of nessages recorded as sent by the si mil ati on. Abase 
reference rate of 0. 75 and 0. 25 i s shown in the figure. Nate that in these graphs, the 
predi cted nunber of messages per read and write seeiis over- predi cted for hi gher val ues of 
the base reference rate. In fact, the nunber of si mil at ed nsssages is low. This effect is 
caused by the net hod of gatheri ng message statistics inthe si mil ati on: messages vhi ch are 
sent f roma physi cal processor toitself, evenif the vi rtual nodes bei ng represent ed change. 
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Rgure 5. 16: This figure shows the average 
as the si mil at ed ones for the cluster appl 
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Fi gure 5.17: Thi s figure conpares the si mil ated and the predi ct ed nunber of nes sages per 
read and wri t e ope rati on for the cl ust er appl i cati on. 
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Model Pararmters 


w 


0.3 


b 


8 


L 


3, 4, 5, 6 


N 


64, 512, 4096, 327( 



Mdel Pararmters 


w 


0.3 


b 


4 


L 


4, 5, 6, 7, 8, 9 


N 


64, 256, 1024, 4096, 16384, 61 



T^bl e 5.2: These are the values of the input paraneters for the nodel . 

are not counted. Because of the way the nappi ng of vi rtual nodes to physi cal processors is 
perfornEd(see Section2.2for details), nany nore nes sages are sent f roma processor to 
i t sel f vhen the base reference rate is hi gh. 

5.6 R"otocol Qiaracteri zati on f or Large Mchi ne Sizes 

Inthis secti on, the veri fiednodel is used t o predi ct the behavi or of the protocol on nachi ne 
sizes toolargetosi mil at e . 



5.6.1 Riraneters 

In order to sinplify the study, several of the nodel input paraneters have been con- 
strained, as shown i n Tkh\ e 5. 2. The fractionof writes is fixed at 0. 3: a reasonabl e choi ce 
for paral 1 el appl i cati ons [28] as \rel 1 as one at vhi ch the he s sages per ope rati on cal cul ati on 
is accurate. Tiees to th t w) di fferent radices, eight and four, are modeled. The range of 
iiachi ne sizes is chosento show the trend of the curves . 

The set of locality parameters i s fixed to a si ngl e val ue for al 1 1 evel s , rather than a set 
of values for each level. This fixing still provides i nt eresti ng resul t;s:^(ba(sause V/ : p 
a uni f ormref erence input streaii^ and;Wi ps a conpl et el y 1 ocal input stream M)st 
appl i cati ons TO 1 1 1 i e bet \reen these tw3 extremes . Hirthermore, th^ ^aluMsffef gnt 
/ seen i n the uni f ormand rel axati on appl i cati on v/ere very cl ose, and the val ues i n most of 
the cluster applications v/ere sinilar. 
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5. 6. 2 Average Iti ght 

Rgure 5.18 shows the average height per request as a function of the nachi ne size 
and the 1 ocal i ty. The "Ifei ght /Q)erati on" charact eri sti c i s det ernined by \rei ghti ng the 
"Ifei ght /Bead" and the "Ifei ght /Wi t e" val ues by the wri t e f racti on. Eesul t s for both radi x 
eight and radix four trees are shown. Nate that the "Michi ne Size" axis is plotted on 
a 1 ogari thni c seal e; al t ernat el y, the seal e ean be vi ewed as 1 i near, rel abel i ng that axi s a 
"Nunber of Level s" wi th the val ues 3-6 for radi x ei ght , and 4-9 for radi x four. 

There are t wo trends to observe. Fi rst note the i nport anee ofloeality, espeeiallywith 
1 arger naehi ne si zes . The seeond effeet i s that of naehi ne size. The average hei ght s al 1 grow 
sub- 1 i nearl y wi th naehi ne si ze, and nearl y 1 i near wi th the nunber of 1 evel s . INbte, however, 
that wi th a 1 arge naehi ne size, and 1 owl oeal i ty, nearl y the enti re tree is bei ng traversed 
duri ng an average request . This behavi or i s el earl y unaeeept abl e. 

5.6.3 Longest Rith 

Fi gure 5. 19 shows the effeet on the 1 ongest path per request as a f uneti on of the naehi ne 
size and the 1 oeal i t y. The f ormof these results is verysinilar to that of the average hei ght . 
The nai n poi nt to note about these graphs is the sheer nunber of nodes eaehrequest will, 
on average, have to pass through. Even if the network bandwidth were large enough to 
support thi s nany request s , the nodes need to exanine eaeh raessage passi ng through, and 
WDul d 1 i kel y have 1 ong queues of pendi ng ihbs sages to exanine. 

5.6.4 Ninber of Mssages 

The nunber of nsssages sent per request as a f uneti on of the naehi ne size and the 
loealityis s hown i n R gur e 5.20. The shape of the eurve of nunber of nsssages sent per 
read is sinilar to those diseussedearlier. The eurves for the nunber of nss sages sent per 
wi t e and the nunber of nsssages sent per operation [the wei ghted eonbi nati on of reads 
and writes), on the other hand, are different. 

Beeause the nunber of nsssages sent per write depends not only on the distribution 
of the nodes witheopies of the bl oek, but also on the nunber oi nodes witheopies of the 
bl oek, the effeet s of naehi ne size and loealityare imehnDre pronouneed. Inst e ad of varyi ng 
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H gure 5. 18: Thi s figure shows the predi cti ons for average hei ght per operati on as a f uncti on 
of nachi ne size and of locality. The left graphs are for radix eight trees; the right graphs 
are for radix four. 
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Rgure 5.19: This figure shows the predictions for the length of the longest path traveled 
per request as a f uncti on of nachi ne size and of 1 ocal i t y. The 1 eft graphs are for radi x ei ght 
trees; theri ght gr aphs ar e for r adi x f our . 
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Rgure 5.20: This figure shows the predictions for the nunber of nsssages sent per request 

as a f uncti on of nachi ne size and of 1 ocal i ty. The 1 eft graphs are for radi x ei ght trees ; the 

right graphs are for radix four. 
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1 ogari thni call y, \\iiere the nai n gai ns are for localityinthe 0. 75 to 1 range, the nunber of 
HESS ages per write versus localitycurveis bar el y affected by snal 1 changes for hi gh 1 ocal i ty; 
as the 1 ocal i ty decreases , the nunber of nes sages sent per writeincreases pol ynoni al 1 y. The 
degree of the pol ynoni al varies wthnachine size, inplyingthat \\hi 1 e applications wth 
poor 1 ocal i ty nay perf ormreasonabl y on snal 1 nachi nes , they w 1 1 s\\anp 1 arge nachi nes . 
The nunber of nes sages sent per write and per ope rati on as a f uncti on of nachi ne si ze i s 
actual 1 y sub- 1 i near. As a f uncti on of the nunber of 1 evel s , ho\rever, the nunber of nes sages 
sent is defini t el y quadrat i c. 

5. 7 Issues 

Al though the results describedinthis chapter are of interest in exanini ng the perf or nance 
of the protocol, there are nany extensions that could be done to provide nore insight 
into the abstract protocol behavior. Miny appl i c at i ons should be analyzed to deternine 
the preci se nsani ng of the set of 1 ocal i t y paranst ers . A better esti nat e of the 1 ocal i ty 
paranst er coul d be used. Fi nal 1 y, the data i n an appl i cati on coul d be di vi ded i nt o set s , and 
the locality paranst ers separat el y cal cul at ed f or each one. 

Currentl y, the 1 ocal i ty paranst er set can onl y be deri vedf or an appl i cati onby nsasuri ng 
the paranst ers f romsi mil ati on. W have perf ornsd sons i ni ti al \rork t o\\ards deri vi ng the 
set of locality paransters froma spatial locality nodel of an application, such as that 
available for the cluster application. The deri vat i on war ks best, ho\\ever, for applications 
vhi ch exhi bi t a very hi gh degree of cl ust eri ng. Mre w)rk needs to be done i n thi s area. 

tfei ng a flat set of locality paranst ers i s not necessarily reali sti c. For 1 arge appl i cati or 
runni ng on nassi vel y paral 1 el nachi nes , \re ni ght expect less shari ng to occur near the top 
of the hi erarchy, and nore at the bottom Studi es need to be done of appl i cati ons to provi de 
i nsi ght as to vh.at the hi erarchi cal 1 ocal i ty of appl i cati ons actual 1 y i s 1 i ke. 

For applications \\hi ch have a large variance in the types of dat a ref erenci ng, several 
sets of 1 ocal i ty paranet ers can be used t o avoi d averagi ng eflect s . This w)ul d al 1 ow one 
to separate to del y shared data such as synchroni z ati on vari abl es f romi ess used ones . Thi s 
se par ati oni s useful because an appl i cati on nay st al 1 due to the hi gh shari ng of synchroni za- 
ti on vari abl es . Thi s nethod ni ght al so provi de newi nsi ght i nt o the i nteracti ons bet \\een 
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shared data and programexecuti on- ti he behavi or. 

5. 8 Sunnary 

In thi s chapter vie have proposed a net hod of express! ng localityin appl i cati ons napped 
ont o hi erarchi cal architectures. W have used thi s nodel to predict the average he i ght per 
request , the average 1 ongest path per request , and the average nunber of ihbs sages sent per 
request. W used three appl i cati ons i n or der t o val i date the nodel : a uni f ormref erence 
streaii^ a rel axati on al gori thii^ and a clustering data- reference stream 

After val i dati ng the node 1 , \re enpl oyedi t i n the predi cti on of the abstract per for nance 
of very 1 arge nachi nes as a f uncti on of the 1 ocal i ty, studyi ng howthe nodel output s vari ed 
wthnachine size and locality. The nost inportant result is that locality is extrensly 
inportant in an application. As nachi ne sizes grow, the locality beconss increasingly 
inportant for reducing latency. 

W ™11 use the abstract nodel as input to an enbedded nodel in Chapter 6. The 
enbedded nodel describes howthe protocol runs \\hen napped ont o parti cul ar nachi nes. 
This ™11 allowus to study howthe protocol behaves under conditions \\here requests are 
not al 1 o\\ed to send an unl i ni t ed nunber of nss sages w thout penal t y. 



Cha p t e r 6 



Emb edded Anal ysi s 



This chapter extends the abstract analysis of Chapter 5 to showhowenbeddi ng PHDinto 
a nachi ne affect s the behavi or of the protocol . W use the nappi ng descri bed i n Secti on 2. 2 
to enbed the protocol i nt o a k- ary n- cube. The enbedded nodel describes this napping, 
as \rell as the configurati on of the architectures being studied. 

In our study \re find that mil ti threadi ng i s only useful for approxi natel y t w) t o f our 
threads; i nt eri eavi ng nore than that does not decrease the overall latency. For snail na- 
chi nes and hi gh 1 ocal i ty appl i cati ons , this linitationis due nai nl y to the 1 ength of the 
runni ng threads . For 1 arge nachi nes to th nedi umt o 1 owl ocal i ty, this linitationis due 
nai nl y t o the 1 arge protocol overhead. 

W al so consi der the addi ti on of control lers to the processing nodes. WtoH see that 
the gains fromthe addition of these controllers are not large enough to j usti fy hard\\are 
vhi ch i s nore expensi ve than processors . In no case does the addi ti on of the controll ers 
save nore ti ns than doubl e the nunber of processors . 

W first provi de a bri ef descri pti on of the enbedded no del in Secti on 6.1 and deri ve the 
necessary i nput s . W then charact eri ze the behavi or of the napped protocol i n Secti on 6. 2 
for several di fferent archi t ectures . Fi nai 1 y, \re di scuss vhat further i ssues need to be studi e^ 
i n Secti on 6. 3. An al phabeti cal 1 i sti ng of al 1 of the vari abl es defined in thi s thesi s can b( 
f ound i n Tkh\ e A 1 . 
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6. 1 M Enbedded Mdel 

The enbedded nodel i s basi cal 1 y the nodel deri ved by Johnson i n [ 13] , si i ghtl y nodi fied 
to sui t our purposes . The abstract no del is used to generate the i nput s to the enbedded 
nodel . 

The enbedded no del is usedto study t w) nai n archi t ectural configurati ons : a nachi ne 
i n vhi ch protocol acti vi ti es are handl ed by the sans processor vhi ch i s at t enpti ng t o do 
w)rk, as outl inedin Section 2. 2, and a nachi ne i n vhi ch protocol acti vi ti es are handl ed by 
a separate controller. 

6. 1. 1 Mdel Oervi ew 

Johnson devel oped a f ranswark for nodel i ng howcomnini cati on affect s perf ornance. H s 
franEwark consi st s of three part s : a net warknodel , an appl i cati onnodel , and a trans acti on 
nodel . These three are conbi ned i nt o a si ngl e no del i n order to provi de feedback bet \\een 
each subsystem nodes w 1 1 be unabl e toinject nss sages i nt o the net war k faster than the 
trans acti on 1 at enci es w 1 1 al 1 ow. The no del is fullydescribedin[13]; onl y the part s of th 
no del vhi ch have been changed for thi s anal ysi s w 1 1 be di s cussed i n detai 1 . 

The enbedded no del directlyuses the appl i cati on and the net war k nodel s . An appl i - 
cati on consi st s of threads runni ng on processors . The threads run unti 1 they nake off- node 
requests ( comiuni cati on trans acti ons ) . In the absence of nul ti threadi ng, the threads sus- 
pend until their transactions finish. If there is nul ti threadi ng, and there are still runnabi 
threads, a context swtch occurs, and a new thread i s started. 

In Johnson's nodel the appl i cati on nodel invokes comiuni cati on trans acti ons ; i n our 
enbedded nodel it invokes off- node requests to shared nEnory. The off- node requests 
are nodel ed i denti cal 1 y t o the transacti on nodel , except that the nsaningof one of the 
paranst ers is di fferen^^, tZhe fixed del ay of Johnson' s no del, represent s the tins necessary 
to process protocol request s by a non- 1 eaf node. As such, i t bee ons s a f uncti on of hownany 
nES sages are sent. 

The net war k no del is usedtodeternine average nsssage 1 at ency, gi venani nput nsssage 
size, injectionrate, and c onmini cation di s t anc e . The ne t war k i s ass uns dtobe ak-ary 
n- di nsnsi onal nssh, w th separate uni di recti onal channel s i n both di recti ons . 
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Tkbl e 6.1: The additional basic input pararaeters needed for the enbedded model 



B 



Average nuiiber of messages incritical path of a non- 1 ocal shared- mBmor] 



Average nuiiber of messages per non-local shared- nemory request , 



Average message size (inflits), 



Average di stance a message travel s (i n hops ) , 



Average di stance a message travel s i n each di nensi on. 



Average thread run 1 ength bet\reensuccessive non- locally sati sfiabl e rec[uest s 



Nan- network overhead t o sati sf yi ng a non- 1 ocal shared- memory request , 



r e que s t , 



Tkbl e 6.2: The derived input parameters needed for the enbedded model . 

6. 1. 2 Mdel Inputs 

The enbedded model takes as i nput nany parameters . Some of these paranet er s have been 

di scussed i n the abstract analysis chapter, and vary dependi ng on the application. Other 

parameters , shown i n Tkh\ e 6.1, need to be specified onl y vhen the protocol is napped to 

an ar chi t ecture. The enbedded model uses athirdset of parameters , deri ved f romthe first 

twosets, as its actual i nput s . Thi s thirdset is listed in T^bl e 6. 2, and w 1 1 be deri ved i : 

this secti on. 



Local 1 y Sat i s fiabl e Shared-lVtrmry Re que s t s W fir st cal cul at e the nunber of re- 
quest s t o shared memory that are 1 ocal 1 y sati sfiabl e, touseinlater equati ons . W deter nine 
the expected f racti on of 1 ocal 1 y sati sfiabl e reads by cal cul at i ng the probabi 1 i ty that a reques 
w 1 1 come f roma node i n the cl ass of nodes vhi ch are val i d. 
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L 

^Z,] =\[ti (6.1) 

1=1 

W find the expectedfractionof locallysati sfiabl e wri t es by cal cul ati ng the probabi 1 i ty 
that a request m\\ conE f roma node i n the cl ass of nodes vhi ch are val i d, and al 1 of \\hose 
ancestors are the onl y val id nodes intheir set of siblings. 

L 
^Z^] =nW (6.2) 

1=1 

The expected f racti on of 1 ocal 1 y sati sfiabl e shared- he nor y request s is just the \rei ght ed 
f r ac t i on of^ ^nd Z^ . 

^^ =rZ r +vZ ^ (6.3) 

Minber of IVfts sages in Critical Path The cal cul ati on of the expect ed nunber of 
HESS ages inthe critical path of a non- locallysati sfiabl e shared- nenory request , c, is sinilar 
to the cal cul ati on of the 1 ongest path for an op^raatidd^ {ix Equati ons 5. 4 and 5. 5) . 
There are, ho\rever, t wa di fferences . Rrst, \re condi ti on the cal cul ati on on non- 1 ocal oper- 
ations by di vi di ng by the f racti on of non- 1 ocal requests. Second, \re onl y \\ant t o consi der 
those HESsages \\hi ch actual 1 y need to be sent off"- node. Inthe enbedding, a parent node 
and its 6childrenfor a parti cul ar bl ock correspond to 6nodes . Thi s nsans that one of the 
childrenis si tuat ed on the sanE physical processor as its parent. Equati on 6. 4 gi ves the 
expected nunber of he s sages for a read request . 

^cr] =r^E(^2/.+l)(l-f,) n ii (6.4) 

1-Zr ^^ \ b ) ^^^^ 

The critical path for a wri te request contains a f an- i n and f an- out to all nodes wth 
copies of the block. W nake the reasonable as s unpt i on t hat at least one of these paths 
TO 1 1 cont ai nno he s sage sends bet \\e en nodes napped to the sanE processor, so the expected 
nunber of he s sages inthe critical path for a wri t e ope rati on i s j ust the expected nunber i n 
the 1 ongest path for a wri t e, condi ti oned on the non- 1 ocal factor, as shown i n Equati on 6. 5. 
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^c^] =- —^l^] (6.5) 

The expected nunber of he s sages inthe critical path of a general request is cal cul at ed 
by \rei ghti ng^cand c^, by the write fracti on. 

^c\ =rc r +W. w (6.6) 

Nunber of ]Vt s s ages The cal cul at i on of the expected nunber of nss sages sent for non- 
locally sati sfiabl e shared- nEnory request s, g, is sinilar to that of the nunber of nss sages 
sent i n the abstract nodeV ^nd m^^ i n Equati ons 5. 6 and 5. 7) . There are t wa di fferences 
bet\reen these calculations, the sans as described for Equations 6. 4 and 6. 5 above. The 
expected nunber of nss sages for a read ope rati on i s gi ven i n Equati on 6. 7. 

^9r] =3^E(^3/.+l)(l-f,) n ii (6.7) 

'" h=l l=h\-l 

A wri t e ope rati on generates nany nes sages , sone of vhi chare expected to staylocal 
to a physi cal processor. Si nee vie are not count i ng the cri ti cal path 1 ength, j ust the total 
nes sages sent here, \re do \\ei ght by the branchi ng factor, as shown i n Equati on 6. 8. 



^9, 



rVS ((^2/.+!+ "^2^:11 c] {l-vit,) n W) (6.8) 

^^"'fclVV" ° l=le=l-l J l=h^l J 



The expect ed nunber of nessages sent by a general request is cal cul at ed by wei ghti ng 
g^. and g^ by the write fracti on. 

^i =rg T+vg w (6.9) 

Fl i t s per Ts/ks sage The expected nunber of fii t s sent per nsssage i s dependent on the 
exact nachi ne to whichthe protocol is napped. Define /as the nunber of fii t s per word, 
and Cas the cache line size in words. W assuns 32 bit words for the purpose of this 
calculation. Aread sends the 4 yvord find J owes t-comrvn-f or _read nessage, the 3 word 
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readmnss sage, the 3 +C\rord read-data he s sage, and the 4 +C'HDrd confirmvalue 
nessage. These nes sages are descri bed i n Tkh\ e 2. 4. 

1 -Zr ^^ Sh+1 ^^^^ 

Awrite operati on sends the 4 v/oid findJ owe st-comwri-for-wite nessage, the 3 ■word 
I ock nessage, the 4 ■word ack and ackl nes sages , the 2 +Cw)rd s -writedown nessage and 
t he 3 ■nor A writ e.ok ne s s age . 



The expected nunber of fii t s per general nessage sent, ^ is \rei ght ed by the nunber of 
nes sages sent by each type of operati on as \rel 1 as the frequency of each operati on. 

^^= ^^-g^+W ^B^ (6.12) 

rgr +yg w 

LKstance per Ts/ks sage The expected di stance i n hops that the average nessage travel s , 
4 depends on the radix fcand the nunber of dinensions nof the nachi ne to \\hi ch the 
protocol is napped. Nate that the enbeddi ng i s such t|i&,t #D^e 6i s the branchi ng 
factor of the tree, is e qual ton 

W first det ernine ;</ the expected nunber of hops needed for a nessage sent bet \\een 
a node at level / and its parent, \\here the parent and the node are located on separate 
processors . Thi s is j ust the nunber sent for a 2- ary n- cube [2] [ 22] , seal ed by the di 1 ati on 
caused by hi gher levels. 

W=^^.i = ^^.i (6.13) 

^ ^ 2(6-1) 2" -1 ^ ' 

W al so need to det ernin^,dthe nunber of hops that a hi erarchy- ci rcunventi ng nes- 
sage, such as read-data, m\\ take. Anessage of this for mi s sent froma node directly to 
another node. The twa nodes are guaranteed not to share a cornion sub- cube snaller than 
that of their lo\\est c ornion anc e s t or . The val u^awDfecfcount ed, and are enunerated 
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Tkh\ e 6. 3: Thi s t abl e sho^ tSie enpi rically deternined nunber of hops a he s sage sent 

di recti y f roma node i n a subtree to one out si de of its subtree takes , if the lo\\est connon 

ancestor of those t w) nodes is at 1 evel /. Thi s cal cul ati on as sums a uni f ormdi stri buti on. 



in Tkhle 6. 3. 

The expected di stance per readnsssage is gi veni n Equati on 6. 14. Nate that the nunber 
of hops sent bet \\een each 1 evel is sunnsd and then nul ti pli ed by the nunber of tree 
traversal s . W do not \rorry about excl udi ng the nss sages vhi ch are not actual 1 y sent ; the 
di fference i s negl i gi bl e. 

The expected di stance per wri t e nessage is sinilarlycalculatedin Equati on 6. 15. 



^^-] =r^^^ 2fe+i+2 v^ n^^ c V-'^hth) n ti^i 

i ^"'fcl Zn+i+Z 2^Z=l lle=Z-l Ce ;^;^i 

(6.15) 
The expect ed di stance per general nessage type sent, d, i s \rei ght ed by the nunber of 
HESS ages send by each type of operati on as \rel 1 as the frequency of each ope rati on. 



^4 






(6.16) 



LKstance per LKnension The average di stance a nessage travel sin each^^^i recti on, k 
assuning i ndependence, is just d/n In Johnson's nodel , \\hejiffi Yetefe s than one, the 
average per- hop 1 at ency f or the head of a nessage i s fixed to one. 
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Sun 1 engt h Bet ■ween Re que s t s The average thread run 1 ength bet\reen success! ve 
requests to shared nEnory, ^is one no del input. Anothej, itshMaver age ti ne needed 
to sati sfy a 1 ocal 1 y sati sfiabl e shared- HEnory request , i.e. one that does not cause any 
off- node t rafft. The average thread run 1 ength bet\\een success! ve request s t o non- 1 ocal 1 y 
sati sfiabl e shared he nor j^, Ts a f uncti on of these t \ro paranet ers . Ibr the purpose of 
thi s nodel ,^3' s consi dered t o be the useful wark done by the processor. 

^Tr] =Rh Y^i^^ -) (6-17) 

Nd n- ne t wo r k Overhead The napping of non- leaf nodes onto the sane processors as 
leaf nodes guarantees that all processors wll need to spend sons tins processing protocol 
transitions, instead of perforning actual \rork. W nodel this non- net \rork ^,verhead, T 
as a f uncti on of g, the average nunber of he s sages sent per oper^tihH,aMrage ti he 
in processor cycles needed to handle a protocol request, b, the branching f^^ctor, and N 
the ti HE taken up by the net war k i nt erf ace. 

If every node is sending g^nEssages on average per request, every node wll have to 
process g^nEssages. The cost of processing a he ss age iifs iM stays on the current 
processor, and^iVf iV^ i f i t cohes i n fromanother processor. 

^Tf] =gMr+^^gNi (6.18) 



6.1.3 Mdel Cbnstraints 

The no del devel oped by Johnson i s onl y val i d under cert ai n condi ti ons , \\here the aval 1 abl e 
paral 1 el i smi s snal 1 er than the conmini cati on transacti on 1 at ency, so that despi t e mil ti - 
threadi ng, the processors w 1 1 be i dl e part of the ti he. In the enbedded nodel , the extra 
ti HE \\here threads waul dnornal 1 y be \\ai ti ng can be used to support protocol transacti ons . 

In an architecture wth no controller, \\here every processor both runs threads and 
support s protocol transacti ons , the anal ysi s is onl y val i d as 1 ong as the thread run 1 ength, 
the context swtch tins, and the protocol transaction overhead tins do not exceed the 
transacti on 1 at ency ti HE, as shown i n Equati on 6. 19. 
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Tt>Pm{Ts+Tf) +{p^-l)T, (6.19) 

Wen this condition is false, the average i nter- transacti on i,saiaieltiiniri:,ed as 
f ol 1 ows : 

tt=Tr+Ts+Tf (6.20) 

Wen the protocol request s are handl ed by a control ler, there are twa constrai nt s that 
mist be net . Fi rst , the transacti onl at ency ti ne mist be greater than the thread run 1 ength 
and the context swtchtinE: 

Tt>Pm{Ts) +{pm-l)Tr (6.21) 

Wen this condition is false, the average i nter- transacti on j,saiseltiini8t,ed to 
Tr +T s- The transacti on 1 at ency ti he pi us the thread run 1 ength mist al so be less than the 
ti HE re qui red by the control ler to process the protocol request s : 

Tt>PmTf-Tr (6.22) 

Wen thi s condi tionis false, the protocol is linitedbythe speed of the control 1 er to a 
tins of no less t han: 

tt=Tf (6.23) 

Wen both of those condi ti ons are f al se, the protocol is 1 i ni ted by the 1 argest of the 
above t r ans actionissue tinES. 

6.2 R"otocol Qiaracteri zati on 



There are several fundansntal questions \re mist address. The first is the snallest 
average i nt er- transacti oni ssue ti he that can be sust ai ned. Thi s ti he depends on the degree 
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Shared Mdel Pararmters 


Ts 


20 cycl es 


Ml 


10 cycl es 


Ni 


15 cycl es 


C 


8 ■words 


f 


2 flits 


Pm 


1, 2, 4 


P 


0-0.9 


W 


0.3 


b 


8 


L 


3, 4, 5, 6 


N 


64, 512, 4096, 327( 



Tkbl e 6.4: These are the values of the input paraneters \\hi chare shared for the diflerent 
ar chi t ectures . 



of mil ti threadi ng: to th raore threads runni ng, raore 1 at ency can be hi dden. Mil ti threadi ng 
is only useful up t o a cert ai n poi nt , ho\\ever. Another inportant question is vh.at sort of 
overhead is seen, and vh.at are its sources ? Are thelinits set by protocol overhead or by 
net w)rk 1 at ency? This section flrst describes the pararaeters chosen for the study, then 
shows the results of the study. 



6.2.1 Riraneters 

W chose to study three iiachi ne conflgurati ons , representing a variety of architectures. 
Across all three conflgurati ons parti cul ar pararaet er s , listedin Tkh\ e6.4, \rereheld constant , 
si nee vie v/ere most interestedin varyi ng the other pararaet er s . 

Fi gure 6. 1 shows the nunber of fli t s per he s sage, and the di stance travel edper he s sage, 
as a function of nachi ne size and locality. These paransters are used as input to the 
enbedded iH)del . The nunber of flits per nEssage is higher for hi gh 1 ocal i ty; this eflect 
is causedbythe doninationof longer nEssages, such as read-data. The nor e nodes which 
need to be i nval i dated, the 1 ower the nunber of fli t s wi 1 1 be. The eflect of the di stance bei ng 
so conparati vel y 1 arge for hi gh- 1 ocal i ty 1 arge nachi nes i s caused because the di stance grows 
exponent i al 1 y as one ascends the hierarchy, yet all levels of the hierarchy are assigned an 
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Figure 6.1: This figure shows the predictions for the average nunber of flits needed per 
HESsage, andthe average distance that ansssage travels, as functions of nachi ne size and 
of 1 ocal i ty. 
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Tkh\ e 6. 5: These are the val ues of the i nput paranet ers vhi ch are vari ed across the di fferent 
archi tectures . 



equal 1 ocal i t y paranet er. Thi s i npl i es that further studi es wthaset of graduated 1 ocal i ty 
paraneters rather than fiat ones night be interesting. The nunber of critical nessages in 
an operation, c, andthe total nunber of nes sages sent per operation, g, are si nil ar to / 
and iji graphed i n Fi gures 5. 19 and 5. 20 and are therefore not shown here. 

Tkh\ e 6. 5 lists the val ues of the paranet ers whi ch were vari ed across the archi tectures. 
The s e val ue s c or r e s pond to three sit uat i ons : 

1. Q)ti ni sti c- Q)ti ni sti c: The run length between shared- nEnory references is long 
( such as on the J- Michi ne, where float i ng poi nt operati ons are i npl ensnted in soft- 
ware), and the protocol overhead is low. 

2. Q)ti ni sti c- Pessi nisti c: The run 1 ength bet ween references i s hi gh, and the protocol 
overhead is hi gh (f or exanpl e if the protocol was i npl enented entirely in software) . 



3. Pessinistic- Q)ti nisti c: The run 1 ength between nss sages is very short , and the pro- 
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tocol overhead is low. 

W addi ti onal 1 y consi der the case \\here the non-leaf protocol overhead i s handled by a 
separate controller for all three architectures. 

Fi gure 6. 2 shows the average run 1 ength bet \reen request s t o off- node shared he nor y, 
Tr and the non- net war k- rel at ed protocol processi ng ove^he334 sTrun 1 ength i s an 
i nput to the enbedded nodel , and \\as cal cul at ed f romaverage run 1 ength bet \re en re quest s 
to shared he nor y (^, in Equati on 6. 17. Nate thatiraiFi es f rombei ng al nost exact 1 y 
^ f or nachi nes wthnolocality, to approxi natel y 4ijfor snal 1 nachi nes wthalocalityof 
0.9, and 2ijf or 1 arge nachi nes ™ th a 0. 9 1 ocal i ty. The overhead, shown i n Equati on 6. 18, 
is another input to the nodel , and i s nai nl y affect ed by the nunber of hes sages sent to 
sati sf y an operati on. 

6.2.2 i\rchi tectures Wthout A Separate Qiche ODntroller 

Inall three architectures studi ed, mil ti threadi ng i s onl y useful up t o t wa threads . In other 
wards , i nt eri eavi ng nore than t wa threads does not i ncrease the transacti on i ssue rate. Ear 
snal 1 nachi nes and hi gh 1 ocal i ty appl i cati ons , this linitationis due nai nl y t o the 1 ength of 
the runni ng threads . Eor 1 arge nachi nes w th nsdi umt o 1 owl ocal it y, this linitationis due 
nai nl y t o the protocol overhead bei ng too 1 arge. 

Inter- Ttans action Iss ue Ti ne 

Fi gure 6. 3 shows the average i nt er- transacti on i ssue ti ns for one thread and t wa threads . 
Nate that increasing the nunber of threads fromone to twa provides little speedup. As 
expected, the 1 o\\er protocol processing tinss create nuchbetter transacti on i ssue tinss. 
Si nee the run 1 ength vari es for di fferent nachi ne si zes and 1 ocal i ti es , \re nust 1 ook at vhat 
percentage of ti ns is taken up i n the protocol overhead. 

Protocol Overhead 

W exanine the protocol overhead in order to see hownuch of the transacti on 1 at ency is 
caused by overhead and hownuch represent s war k bei ng done. CVerhead(f)is defined as 
the f racti on of the average transacti on i ssue ti ns not spent runni ng: 
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Rgure 6.2: The left half of this figure shows the predictions for the average run length 
bet \reen off"- node references to shared he nor y as a f uncti on of nachi ne size and of 1 ocal i ty. 
The right side shows the predi cted protocol processi ng overhead ti he (dependent on the 
nunber of he s sages sent) per off- node shared- nEnory request . 
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Fi gure 6.3: Thi s figure shows the predi cti ons for the average i nt er- transact! oni ssue ti ne as a 
f uncti on of nachi ne size and of 1 ocal i ty. The graphs on the 1 eft si de are for no nul ti threadi ng 
(^p^ =1); the graphs on the ri ght are for a mil ti threadi n^of2^.(p 
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^Q=l- ^ (6.24) 

tt 

Fi gur e 6.4 shows these restilts. 

The Q)ti ni sti c- Q)ti nistic case does best, as expected. The Q)ti nistic-Pessinistic case 
i s t ol erabi e f or snal 1 nachi nes to th hi gh 1 ocal i ty. The Pessinistic- Q)ti ni sti c case, on the 
other hand, shows extrensly high overhead for nearly all conditions. If the typical run 
1 ength i s onl ySOcycles, as assunsd f or thi s case, the protocol processi ng ti ns needs to be 
reduced before thi s syst emcan be effect i vel y used. Nate that there is very little speedup 
f romgoi ng t o t wo threads ; i n general , onl y snal 1 nachi nes wi th poor 1 ocal i ty benefit . 

6.2.3 i\rchi tectures Wth A Separate Qiche ODntroller 

In order t o i ncrease the per for nance of the protocol , we consi der the case where a separate 
control 1 er exi st s to handl e protocol requests. This si tuati on wi 1 1 onl y be bene fie i al intwt 
cases: first, where the controller can be added to the systemnore c he apl y than another 
processor, and secondl y, where the control 1 er can be desi gned to be si gni fie ant 1 y f aster than 
a processor. If nei ther of these condi ti ons are true, there i s no benefit t o usi ng a control 1 ei 

W nodel the architectures with a separate cache controller by allowing the inter- 
trans act i onissue tine to decrease unti 1 it reaches the linits caused by ei ther the protocol 
overhead or the run- 1 ength overhead. Wassune that the controller operates at the sane 
speed as the processor di d i n the earl i er experi nent ; the gai ns al 1 cone f romhavi ng a sep- 
arate protocol handl er, not f romi miens e control 1 er speed. For the Q)ti ni sti c- Q)ti ni sti c 
case (i2=500; Af =20) , mil ti threadi ng up t o four results in better inter- transacti oni ssue 
ti nes . For the Q)ti nistic-Pessinistic (iZ=5^&^ l-ftfl) case, up t o ei ght threads can 
be profit abl y used to reduce latency. For the Pessi ni sti c- Q)ti ni sti c (.ii=25D0); M 
case, onl y four threads provi de speedup. Again, these 1 i nit ati ons are due to the thread run 
1 ength for snal 1 nachi nes , or nachi nes wi th very hi gh 1 ocal i ty, and to the protocol overhead 
for 1 arge nachi nes wi th nedi umt o 1 owl ocal i ty. 
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Rgure 6.4: This figure shows the predictions for the protocol overhead as a function of 
nachi ne size and of 1 ocal i ty. The graphs on the 1 eft si de are for no nul ty:hr]e)a5di ng [p 
the ones on the right are for a mil ti threadi ngjiOfc2)(p 
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Fi gure 6.5: Thi s figure shows the predi cti ons for the average i nt er- transact! oni ssue ti ne as a 
f uncti on of nachi ne size and of 1 ocal i ty. The graphs on the 1 eft si de are for no nul ti threadi ng 
(^p^ =1); the graphs on the right are for the largest possible useful mil ti threadi ng, as 
descri bed i n the text. 
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Inter-Ttansactionlssue Tine 

Fi gure 6. 5 shows the average i nt er- transact! on i ssue ti he for one thread and for the nax- 
i mimnunber of useful threads, as described above. Nate that vie do nowsee sohe im 
provenent due to mil ti threadi ng, vhi ch can be better observed i n the overhead graphs . 

Protocol Overhead 

W agai n exanine the protocol overhead i n order to see howmichof the transacti onl at ency 
is caused by overhead and howiiuch represent s \rork bei ng done. Rgure 6. 6 shows these 
r e s ul t s . 

These results are inich better than before. The use of a controller provides enornDus 
gai ns i n practi cal i ty. For an Q)ti in sti c- Q)ti insticarchitecture, we can expect to efftiently 
run the protocol at 1 ocal i t y as 1 owas 0. 7 even on very 1 arge nachi nes . The Q)ti in sti c- 
Pessinistic architecture perforns iiuch better than before, al though it still has too iiuch 
overhead for 1 arge nachi nes . The Pessinistic- Q)ti nistic architecture has al so i nproved, but 
one woul d still not want to use the protocol wi th thi s enbeddi ng on such an ar chi t ecture. 

M)st of the speedup occurs when goi ng f romone thread to two. The gains fromgoing 
beyond that are snail, and occur onl y on the boundary bet ween too iiuch work and too 
Iiuch overhead. For the Q)ti in sti c- Q)ti insticarchitecture, the gai ns occur on the di agonal 
1 i ne bet ween 1 arge nachi nes withlots of locality and snal 1 nachi nes withlittlelocality. For 
the Q)ti ni sti c- Pessi ni sti c case, the line noves closer to the snail nachines with high 
1 ocal i ty. Thi s trend extends to the Pessinistic- Q)ti nisticcase, i npl yi ng that the gai ns al 1 
occur onlyfor snail nachines wi th hi gh 1 ocal i t y. 

Nate that these gains are of course not large enough to j usti fy control 1 ers which are 
nore expensi ve than processors . In no case does the addi ti on of the control 1 ers save nore 
tins than double the nunber of processors. 

6. 3 Issues 

The results describedinthis chapter provi de sons i nsi ght as to how the protocol actual 1 y 
behaves when napped to a k- ary n- cube i n the nanner described in Section 2.2. There 
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Rgure 6.6: This figure shows the predictions for the protocol overhead as a function of 
nachi ne size and of 1 ocal i ty. The graphs on the 1 eft si de are for no nul ty:hr]e)a5di ng [p 
the ones on the ri ght are for the 1 argest possi bl e useful nul ti threadi ng, as descri bed i n the 
text . 
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are nany nore i nt eresti ng experi nent s to be done, ho\\ever. Ilrst, other configurations 
of nachines \\iii ch w)ul d w)rk bet t er wththe protocol should be studied. Second, other 
enbeddings of the protocol to k-ary n- cubes should be considered. H nal 1 y, other cache 
coherence prot ocol s shoul d be studi edto deternine the conpeti ti veness of the perf ornance 
of the Protocol for Hierarchical Drectories. 

The iiai n 1 i nit at i on of usi ng thi s imppi ng t o eiibed the prot ocol t o an ar chi t ecture i s 
cl earl y the protocol overhead. There are several \\ays to fix thi s probl em Che i s t o bui 1 d 
fast control 1 er s vhi ch can i ndependentl y process the prot ocol request s , vhi ch i s one of the 
goal s of the MT j\l ew f e proj ect [ 3] . The cost of addi ng such a control 1 er to the iiachi ne 
mist be bal anced agai nst the potential speed benefit s . 

Another \\ay to reduce the overhead i s to guarantee that hi gh 1 ocal i ty i s iiaintained, 
TO th references to shared nenBry rare in conpari son to the ti ne needed to process pro- 
tocol request s . In or der t o do thi s , conpi 1 er t echnol ogy for st ati c data pi ace nent mist be 
inproved. Prograns mist be conpi 1 ed speci fical 1 y to reduce the anount of data sharing. 
This t echnol ogy waul d benefit all cache coherence protocols. 

INbne of the ar chi t ectures studi ed in thi s chapter \\as ever 1 i nit ed by the speed of the 
net war k. Thi s i ndi cat es that ei ther the assunpti ons i npl y a processor- net war k speed ni s- 
nat ch, and that the netwarkis too fast, or that the protocol is f undanent all y t oo si ow. 
Studi es usi ng a very fast control ler wthfast processors, or fast control 1 er s and si owproces 
sors coul d be used to eval uate howthe prot ocol perf ornance i s affect ed by the enbeddi ng. 

This study does not indicate \\hether or not PHDwauldbe mare useful for large na- 
chines than scheiiES wthlinited-directories, or evenw thout cachi ng. Astudy conpari ng 
these schenes for di fferent val ues of the 1 ocal i ty paranet er waul d be very enl i ght eni ng. W 
bel i eve that PHDw 1 1 perf ormbest on 1 arge nachi nes w th decent hi erar chi cal 1 ocal i ty, and 
lowprotocol overhead. Wether or not these conditions wll occur for real applications is 
unknown. 

6. 4 Sunnary 

In thi s chapter \re used anenbeddedmadel to s howthe perf ornance of the protocol napped 

onto vari ous architectures. Wl ooked at average i nt er- transacti on i ssue ti ne and protocol 
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overheadfor different 1 ocal i ty paranet ers , mil ti thread! ng, and nachi ne sizes. 

W deter nine d that mil ti threading i s onl y useful for approxi nat el y t w) t o four threads ; 
any addi t i onal i nt erl eavi ng does not decrease the overall latency. For snail nachines and 
hi gh 1 ocal i ty appl i cati ons , this linitationis due nai nl y t o the 1 ength of the runni ng threads 
For 1 arge nachi nes w th nsdi umt o 1 owl ocal it y, this linitationis due nai nl y t o the hi gh 
protocol overhead. 

W di scovered that the enbeddi ng to 1 1 w)rk\rell given fast protocol processi ng ti ihb 
and relativelyfewreferences to shared ihb nor y. In the best case onl y 9%of al 1 cycl es are 
taken up by protocol overhead for snal 1 nachi nes wthO.Qlocality. Thi s increases to 28% 
for 1 arge nachi nes (32768 processors) wthhighlocality, and 39%f or snal 1 nachi nes w th 
poor 1 ocal i ty. 

Wth the use of separate cache controllers, \re can do even better. For a locality of 
0.9, \re can reduce the overhead to l%overhead for up to 32768 processors. For a locality 
ofO, \re can see as little as 4% overhead for 64 processors, rising rapi dl y as the nunber 
of processors increases. The gains fromthe addition of these controllers, ho\\ever, are not 
1 arge enough tojustify har d\\are vhi ch i s nore expensi ve than processors . In no case does 
the addi ti on of the controll ers save nore ti ns than doubl e the nunber of processors . 
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7. 1 Sunnary 

This thesis described the Protocol for Hierarchical Ilrectories, a hi erar chi cal , director 
based cache coherence schene. PHD supports read, write, and test- and- set operations. 
Bead requests are sati sfied i n the simllest subtree contai ni ng both the request er and a 
copy of the requested bl ock; onl y three sets of he s sages are sent up or down that subtree. 
Wite requests are confined to the subtree c ont ai ni ng t he lo\\est c omiDn anc e s t or of the 
requester and al 1 copi es of the requested bl ock; four sets of he s sages are sent up and down 
the hierarchy, tw)of \\hi ch f an out to all nodes to th copies. Tfest-and-set requests are 
i npl enEnted as an optinized conbi nation of read and write requests, and i npl emnt a 
test- and- test- and- set operation. 

An enbeddi ng of PHDi nt o k- ary n- cubes \\as al so proposed and eval uated. The imp- 
pi ng transl at es hierarchical 1 ocal i ty i nt o physi cal locality. The iiappi ng al so di stri butc 
higher level tree nodes over iiany physi cal processors, both t o i ncrease band™ dt h and t o 
prevent bottlenecks at the top of the tree. 

W bui 1 1 a si mil at or to experi mnt to th PHD The si mil at or i npl emnt s the f ul 1 pro- 
tocol plus cert ai n ext ens i ons , such as local al 1 ocati on and opti onal aut onati c al 1 ocati on on 
uni ni ti al i zed data. The si mil at or is trace- dri ven, and can gather nany t ypes of statistics for 
studyi ng the protocol . The si mil at or has beenusedtotest the protocol; addi ti onal features 
for debugging include printing events and cache enptying events. Aspecial verification 
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prograin\\as al so desi gned to ensure that the protocol kept the he nor y consi stent . 

This thesis describes tw) analytical nodels: an abstract one and an enbedded one. 
The abstract nodel characterizes aspects of the protocol \\hi chare not dependent on the 
archi t ecture on vhi ch the protocol is run and can be used to eval uat e other hi erarchi cal 
protocols. The enbedded nodel describes the behavior of PHD as it interacts wth a 
nachi ne vhi ch has parti cul ar network and processor charact eri sti cs . The enbedded no del 
derives its inputs fromthe outputs of the abstract nodel . 

7. 2 Cbnt r i but i ons 

PHDis seal abl e i n cost and net wark 1 at ency. IIil i ke other hi erarchi cal protocols, there is 
no bot tl e neck at the top of the hi erarchy. The protocol uses f e\rer hi erarchy traversal s and 
a shorter cri ti cal path to satisfyread ope rati ons than do other hi erarchi cal prot ocol s . The 
protocol supports asynchronous i nval i dat i on t hr ough t he not i on of ownership. 

W proposed a net hod of expressi ng localityin appl i cati ons napped onto hi erarchi cal 
ar chi t ectures and successfullyusedthis model topredict the average hei ght per request , the 
average 1 ongest path per request , and the average nunber of ihbs sages sent per request . W 
used three appl i cati ons i n or der to val i date thi s abstract model : a uni f ormref erence streaii^ 
a rel axati on al gori thii^ and a clustering data- reference stream After val i dati ng the model , 
vie enpl oyed it i n the predi cti on of the behavi or of the protocol on very 1 arge hi erarchi es , 
studyi ng howthe model resul t s vari ed to th nachi ne size and 1 ocal i ty. 

This abstract model \\as used to generate the inputs to an enbedded model ; the em 
beddedmodel descri bed howthe prot ocol runs \\hen napped onto parti cul ar nachi nes . W 
looked at average i nter- transacti on i ssue ti ns and prot ocol overhead for different locality 
paransters, degrees of mil ti threadi ng, and nachi ne sizes. 

The enbeddi ng perf oriis \rel 1 vhen the run 1 ength bet \\een references to shared nEnory 
i s at least an order of nagnitude less than the tins spent to process a protocol state 
transi ti on. If separate control 1 er s for proces si ng prot ocol request s are i ncl uded, the prot o 
scales to 32k processor nachi nes as 1 ong as appl i cati ons exhi bi t hi erarchi cal 1 ocal i ty: at lea 
22%of the gl obal references mist be abl etobe satisfiedlocally; at most 35%of the gl obal 
references are all o\\ed to reach the t op 1 eve 1 of the hi erarchy. Wthout the use of separate 
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control lers, latency cannot be hi dden effect! vel y by mil ti thread! ng be cause processors spend 
too mich of thei r ti he s at i sfyi ng protocol request s . 

7. 3 n scussi on 

This thesis has exposed several imj or areas of research \\hi ch should be pursued. The 
tradeoffs i nvol ved i n desi gni ng good hi erar chi cal cache coherence protocols shouldbe char- 
acterized. The abstract nodel of the protocol waul d benefit froma better understanding 
of the 1 ocal i ty paranet er . Wth sone addi ti onal w)rk, the enbedded model can be used to 
aid in the design of shared- ne nor y imc hi nes . 

Thi s thesi s di s cussed sons of the deci si ons vhi ch \rere nade i n the desi gn of a hi erar - 
chi cal cache coherence system The effects of these decisions have not been f ul 1 y expl ored. 
A conpari son of PHD and another hierarchical cache coherence protocol \rould still be 
i nstructi ve. 

Currentl y, the 1 ocal i ty paranst er set can onl y be det er nine d for an appl i cat i on by si mi- 
1 ati on. W have per f or he d sons i ni ti al \rork t o\\ar ds deri vi ng the set of 1 ocali ty paranst er s 
froma spatial 1 ocal i ty nodel of an appl i cat i on, such as that available for the cluster ap- 
plication. The deri vat i on war ks best, ho\rever, for applications \\hi ch exhi bi t a very high 
degree of cl ust eri ng. M)re \rork needs to be done i n thi s area. 

Al 1 of our 1 arge nachi ne studi es use a fiat set of locality paranst er s to constrai n the 
study space. Ui ng a fiat set of locality paranst er s , ho\\ever , i s not necessarilyrealistic. Fo 
1 arge appl i cati ons runni ng on imssi vel y paral 1 el iiachines, \re ni ght expect 1 es s sharing to 
occur near the top of the hi erar chy, and nor e at the bottom Al t hough fewl arge appl i cati ons 
exi st today, as ones are writtentheycanbe studi edin order todeternine reasonabl e locality 
paranst er s . 

For applications \\hi ch have a large variance in the types of dat a ref erenci ng, several 
sets of 1 ocal i ty paranst er s can be used, to avoi d aver agi ng effect s . This waul d al 1 ow one 
to separate w del y shared data such as synchroni z ati on vari abl es f roml ess used ones . Thi s 
se par ati on waul d be useful because an appl i cati on nay st al 1 due to synchroni zati on i nst ead 
of nornal changed data. Thi s nsthod ni ght al so provi de newi nsi ght i nt o the i nt eracti ons 
bet \reen shared data and programexecuti on- ti ns behavi or. 
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Qiapt er 6 sho\\B that the nai n 1 i ni tati on of usi ng the nappi ng proposed inthis the sis 
to enbed the protocol to a k- ary n- cube i s the pot enti al 1 y hi gh protocol overhead. Several 
\\ays to fix this probi emshoul d be explored. Che is to build fast controllers \\hi ch can 
i ndependentl y process the protocol requests, the approach of the MTAl ew fe proj ect [3]. 
The cost of addi ng such a control 1 er to the nachine nust be bal anced agai nst the pot enti al 
speed benefit s . 

INbne of the archi t ectures studi edinthis the sis \\as 1 i ni ted by the speed of the net w)rk. 
Thi s i ndi cat es that either there is a processor- net w)rk speed ni snat ch, and that the net v/ork 
is toofast, or that the protocol i s f undansnt al 1 y t oo si ow. Studi es usi ng a very fast controll e 
wthfast processors, or fast control 1 ers and si ow processors coul d be used to eval uat e how 
the protocol per for nance i s affect ed by the 1 ayout , andpossi bl y howt o bui 1 d shared- nEnory 
nachi nes . 

There are nany areas left to be explored. IJ)pernDst i n our ninds is the question of 
\\hether or not hierarchical protocols wll perf ormbet t er than ffat di rectory schenss , or 
even no cachi ng at al 1 , for actual applications. Eeternini ng exactl y \\here the tradeoffs are 
i n conpl exi ty, technology, appl i cati onl ocal i ty, conpi 1 ati on ti ns, and nachi ne si ze waul d be 
extrensly enli ght eni ng. W bel i eve that PHDw 1 1 perf ormbest on 1 arge nachi nes w th 1 ow 
protocol overhead runni ng appl i cati ons exhi bi ti ng hi er archi cal 1 ocal i ty pat terns . Wether 
or not these conditions wll occur for real applications is unknown. 

Eegardless of vhat cache coherence schens is chosen conpi 1 ers mist be developed to 
nini nize data shari ng. The schedul i ng of processes and the pi acensnt of data w 1 1 be sons 
of the nDst i nport ant probl ens i n bui 1 di ng nassi vel y paral 1 el conput er syst ens . 
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No me ncl at ure 



B 



Ave r age ne s s age size (inflits). 



C 



Nunbe r of war ds i n a c ac he 1 i ne . 



El 



Probabi 1 i ty a node accesses bl ocks owned by a node i n i t s sublfrraoedef .t 



Nunber of 1 evel s i n t he hi erarchy. 



Ml 



Average tine to satisfy a locally sati sfiabl e request to shared nenor j. 



Mr 



Average ti ne to process a protocol nes s age i nvoked on a processor . 



N 



Nunber of processors. 



Ni 



Average net war k i nt erf ace overhead 



O 



Overhead: Fraction of the average t r ansae ti on i ssue tine not spent running 



R 



Average thread run 1 engt h bet\reen successive requests to shared neno] y 



Ri 



Nunber of reads \(hi ch reach 1 evel /. 



Non-net WDrk overhead to sati sf yi ng a non- 1 ocal shared- nenory request . 



Average thread run 1 engt h bet\reen successive non- 1 ocal 1 y sati sfiabl e 



requests . 



Context swtchtine. 



Wi 



Nunber of wri t es yhi chreachlevel / 



Zjjt , Zjyj , 



Z Ecacti on of 1 ocal 1 y sati sfiabl e {reads , wri tes , requests } to shared ne 



nor y. 



Tkh\ e A 1: Part I of the t abl e 1 i sti ng all of the paranet ers used by the thesi s . Part His 
1 ocat ed on the next page. 
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Ill 



Branching factor of the hierarchy. 



ci 



Nunber of val i d chi 1 dren of a val i d node at 1 evel / 



^r ) te-) ^ 



Average nunber of messages i n cri ti cal path of a non- 1 ocal shared- nenory r 



e que s t . 



Average di stance a message t ravel s (i n hops) . 



Expe c t e d nunbe r of hops bet\reenanode audits physi cal 1 y di sti net parent. 



Expe c t e d nunbe r of hops a message \(hi ch ci rcurarent s the hi erarchy to 1 1 take. 



Ecacti on of al 1 operati ons by a node yhi ch occur to i t s own dat a. 



/ 



Nunber of flits per ward. 



st}. 



9r, 9w 9 Average nunber of messages per non-local shared- memory {read, write, requ 



• t-r ) w") '*' 



Average he i ght a {read, write, request } i s expect ed to reach. 



k 



Nunber of processors per dimension. 



kd 



Average di stance a message t ravel s i n each di mens i on. 



I'r ) W) ' 



Longest pat h traversed duri ng a {read, write, request}. 



m^, ^, m Average nunber of messages sent during a {read, write, request}. 



Nunber of dimensions. 



Pi 



Val ue of t he 1 ocal i t y paramet er at 1 evel / 



Efegree of hardware mil ti t hreadi ng. 



Ecacti on of reads i n t he shared- memory reference stream 



Probabi 1 i t y of t aki ng a val i d path down f roml evel / duri ng node sel ecti on 



The average i nter- trans act i on issue time. 



Probabi 1 i ty that c chi 1 dren of a val i d node at 1 evel / are valid. 



Ecacti on of writes inthe shared- memory reference stream 



Tkh\ e A 2: Part II of the t abl e 1 i sti ng all of the paranet ers used by the t hesi s. 



Ap p e n d i x B 



Re 1 a xa ti on Cal cul ati ons 



The height of read and write operations for a gi ven rel axati on probi emcan be exactly 
calculated, as nenti oned i n Qiapt er 5. This appendi x present s the analytical equations for 
tw3 and three dinEnsional rel axati on cal cul ati ons . 

Cal cul at i ng t he Char act eristics of Read Ope rat i ons The first charact eri sti c of 
appl i cati on that nust be understood i s the nunber of read ope rati ons t o nei ghbori ng val ues 
that are 1 ocal , and the nunber that cross vari ous 1 evel s of the hi erarchy. W to 1 1 first show 
a deri vati on f or the 2- di nensi onal nunber s and then the 3- di nensi onal nunber s. Gall n 
the nunber of processors per di nsnsi on, iVthe total nunber of processors , a; the nunber of 
data points per processor per dinension, and Xthe nunber of data points per processor. 
Nate that i n the rest of the thesi s , fci s the nunber of processors ; \re use nhere for si npl i ci ty. 

As can be seen i n Fi gure B. 1, the read references vhi ch reach the hi ghest 1 evel i n the 
syst en^ ^i , w 1 1 be the ones by poi nt s of data abut ti ng the bol dest 1 i nes , vhi ch represent s 
the di vi si onbet \\een the four level 3 processors. The nunber of reads vhi chcross these lines 
is 47a: The nunber of reads \\hi chcross the next highest level is 4(2^). In general, the 
nunber of reads -nhi chcross level lis t-nice as nany as cross level /+!, for all /g[ 1,X-2] . 

For three di nsnsi ons , thereadreferencecalculationissinilar. Ifere\rearedealing™th 
planes insteadof lines. The nunber of reads \\hi chcross the hi ghest ^^an^iinBn 
the 2- di nsnsi onal case, the nunber of reads vhi ch cross 1 evel lis twee as nany as cross 
level /+!, for all I e[l,I^2] . 
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- Level 1 Crossings 

- Level 2 Crossings 

- Level 3 Crossings 
■ Level 4 Crossings 



Level Node 
Level 1 Node 

Node 



D 
D 



Level 3 Node 



Level 4 Node 



Fi gure B. 1: The tree is napped to the processors in such a way that crossing the bol der 
1 i nes represent s reachi ng hi gher 1 evel s of the tree. 



Infact, \re can perf ormthi s cal cul ati on f or an arbi trary ddi nensi onal enbedding. The 
i-1 crossi ng happens for exactl y 2c(%f reads , \\here Z) =^ The reads al \\ays doubl e 
as the level decreases. The nunber of read references reachi ng each 1 evel for an arbitrary 
di nens i on i s s unnar i z e d i n Equat i on B. 1 . 



Rl= { 



BK- 
2Ri 



^1^1 



Efci Rh 1=0 



+1 



2C(72C) 



le[i,i^2] 
1=1^1 



(B.1) 



Cal cul at i ng t he Char act eristics of Wr i t e Ope rat i ons W are no wpre pared t o cal ci 
1 at e the exact nunber of wri t es vhi ch mist reach a parti cul ar hei ght . Inst e ad of sunning, 
for every poi nt , the hei ght s of its nei ghbors , vie mist perf orma naxi mim As can be seen 
in Figure B. 2, there are nany grid points that have neighbors at varying heights. Efeta 
poi nt a i s a t ypi cal data poi nt , to th al 1 of its nei ghbors 1 ocal . The naxi mimhei ght a wri t e 
to thi s poi nt coul dreach, therefore, is 0. Efet a poi nt 6 has a nei ghbor vhi chis across alevel 
1 boundary. Si nee the rest of i t s nei ghbors are 1 ocal , the naxi mimhei ght i s 1. Both poi nt s 
cand dhave nei ghbors across level 2 boundari es , so mist be counted at hei ght 2. The three 
poi nt s e, /, and g^are si nil arl y count ed at hei ght 3, and h, i, j, and fcare si nil arl y count ed 
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Level 1 Crossings 

■ Level 2 Crossings 

■ Level 3 Crossings 

■ Level 4 Crossings 
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Fi gure B. 2: The 1 abel ed poi nt s represent pi eces of data vhi ch mist be caref ull y consi dered 
vhen det ernini ng the hei ght that a wri t e to that data to 11 reach. 



at 4. 

M 1 poi nt s t o be count ed at L-1 can be easi 1 y cal cul at ^ as=4^— 4. The 47a; 
\\as deri ved f or the read case, and the subtracti on of 4 refers to the four cross poi nt s each of 
vhi ch \\as doubl e counted i n the read f orliuDi) cal cul at e the for mil a for / G [ l,L—2] , 
vie count all of the points on a cross for size /, subtracting out the four center ones as 
i n the / = L—1 case, and then mil ti pi y that quant i ty by the nunber of crosses at that 
level. Wthenmist subtract off all points \\hi chare supposedtobe countedas higher-level 
poi nt s . The resul ti ng for mil a, vhi ch appl i es onl y to the t w)- di nensi onal case, is gi ven i n 
Equation B. 2. Nate that;,Cthe nunber of crossi ng poi nt s associ at ed to th each 1 evel , is 
2^^~', and Q, the nunber of processors at level f^H~d .D 



Wi 



n2(a;-2) ^+47<a;-2) +4 / =0 
Gi{^-4)-8{Ci){G-l) le[l,I^2] 
Am-A I =1^1 



(B.2) 



Gal cul at i ng the nunber of data points \\hi ch have conpl et el y 1 ocal neighbors is fairly 



^Thi s doubl e count i s appropri ate for a read, si nee rrore than one read occurs to every block. 
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si npl e. Nate that \re are assuning that poi nt s vhi ch 1 i e on the boundary are readfe\rer 
tines (as nany f e\rer tines as neighbors they lack). There ar^ (:ffli^)et el y 1 ocal 
points per node, plus a;— 2 boundar y poi nt s per edge node, plus an extra point (the corner 
one) on each corner node. 

In the 3- di nsnsi onal case \re mist consider three intersecting planes instead of twa 
i nt ersecti ng 1 i nes . The nunber of writes \\hi ch reach the highest level, ho\\ever, is still 
s t r ai ght f or \\ar d: fromthe read case vie knowthat ther^a^erS»ssi ngs of the three 
planes. The three planes intersect at three separate lines, each of \\hi ch generates four 
doubl e- count ed gri d uni t s per line unit \\hichnust be subtract ed fromthe earlier total. 
The three 1 i nes , ho\\ever , intersect in one poi nt vhi ch has ei ght doubl e- counted gri d uni t s 
around i t . These ei ght gri d uni t s mist be added back t o the t ot al , resul ti ng i n the for mil a 
gi ven i n Equati on R 3. 



Wi= < 



n^{x-2)^ +6n^{x-2)^ +12r}{x-2) +8 1=0 

Gi(^-^^+8)-24nC i{Ci-l) 

\ ^i ^' J (R3) 

+24 (C2( a -!)) + (( a -1)2(7;) /G[l,i^2] 

en'^x'^ -12m+8 1=1^1 



W nowcal cul at e the equati on for / G [ 1,-Z^2] for the 3- di nensi onal case. Urst vie 
consider each level /subunit. There%J"e'i3uch subuni t s . As in the I = L—1 case 
vie count all the points along the three planes, subtract off the line ones, and add back 
in the center eight ones. Wnowmist account for the all the points \\hi chare counted 
at a higher level. These points are the ones at the boundaries of the subunit s. W toH 
calculate these fromlooking at the \\hol e cube, not at subunits. Consider a face of the 
cube, as i n R gure R 3. Each dotted- line cross inthe ri ght si de of the figure i s the edge vi ew 
of the 3- di nsnsi onal obj ect shown at the 1 eft si de of the figure. The bol d 1 i nes on the 1 eft 
figure indicate \\hich points have been doubl e- count ed, and mist be subtract ed out of the 
total . Each di anond and circle inthe ri ght figure represent s aline that mist be subtracted 
out. There are;p;7;— 1) circle lines, and the exact sans nunber of di anond 1 i ne s , per 
di nsnsi on. Eor every circle or di anond 1 i ne, 47a;poi nt s mist be subtracted out . 
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Fi gure B. 3: Each dotted line cross is the edge of the i nt ersecti on of t w) pi anes . At the 
boundari es of these 3- d crosses are 1 i nes (the endpoi nt s of vhi ch are narked as ci rcl es and 
di anonds ) vhi ch cont ai n the poi nt s that are supposed to be counted at a hi gher 1 evel . 




Fi gure B. 4: The ci rcl es and di anonds represent the endpoi nt s of 1 i nes . The int ersecti ons of 
these 1 i nes have been doubl y subtracted i n our total , and mist be added back. 
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After subtracting out all of those points, \re still do not have the correct equation. 
Every\\here the circle and di anond lines intersected, vie doubl e- subtracted poi nt s , and vie 
nust nowadd themback. Consider Figure B. 4. Study the front and top faces of the cube. 
W can see that per front col uim of ci r cj e-sL, f6'ont circles i nt er s;e-el Cbp ci r cl es . 
There are (^front coluims of circles. Sinilarly, per front coluimof dj iimaiiMls , C 
di araonds i nt er seel; Iftp di araonds . There areiGl front coluims of di araonds . This set 
of intersecti ons occurs once for every pai r of di raensi ons [i.e. three tiraes), and generates 
eight points to be added back per crossing. The resul ti ng f ormil a \\as al ready shown i n 
Equati on R 3. 

W nowcal cul at e the nuiiber of data poi nt s vhi ch have conpl etelylocal nei ghbor s for 
the 3- di nsnsi onal case. There are (§;-e8^1 et el y 1 ocal points per node, plus ^a;— 2) 
boundary points per face of the cube node, plus x—2 boundary points per edge node, plus 
an extra point (the corner one) on each corner node. Again, the resulting forinila vias 
al r e ady s ho wn i n Equat i on R 3 . 



Ap p e n d i x C 



Ta ble of Protocol Be ha v i 



The f ol 1 OVA ng sect i ons det ai 1 the behavi or of the Protocol for H erarchi cal El rect ori es . The 
first describes the transitions for leaf nodes, and the second descri bes the transitions fc 
parent nodes . 

Gl Leaf Node Ti-ansit ion Tkble 

The leaf node transitions are a function of the current state and the input nessage. For 
every such conbi nati on, there i s a possi bl e new state to transi ti on t o as vie\ 1 as a possi bl e 
HESS age to send. The possi bl e states are enunerated i n Tkh\ e G 1. Tkh\ e G21ists all of the 
HESS ages that can be recei ved by a 1 eaf node. These he s sages are expl ai ned in Tkh\ e 2. 4. 
Tkh\ e G 3 expl ai ns al 1 of the synbol s used i n the transi ti on t abl e. The he s sages vhi ch can 
be sent by a 1 eaf node are 1 i st ed i n G 4, and a further expansi on of the abbrevi ati ons i s 
listedin Tkh\ e G 7 . The act ual t r ans i t i on t abl e is s pi i t ont o t w) page sin Tkh\ e G 5 . 
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Synbol 



Expans i on 



i nval i d 



i nval i d 



r_yo_npl 



readabi e.yowner 



r_no_npl 



readabi e_nowner 



wf r_no_npl 



\\ai ti ng_f or_read 



w_yo_npl 



wr i t abl e 



wf w_no_npl _nr 



\\ai ti ng_f or_wri te_no wrier Jipl _nread 



wf w_no_npl _yr 



wai ti ng_f or_wri t e_nowier _npl _yread 



wf w_yojipl 



\\ai ti ng_f or_wri t e.yowner _npl 



wf w_no_ypl _nr 



\\ai ti ng_f or_wri te_no wrier _ypl _nread 



wf w_no_ypl _yr 



wai ti ng_f or_wri t e_nowner _ypl _yread 



wf WD_yo_npl 



wai ti ng_f or_wri t e_ok_yowrier _npl 



wf WD_yo_ypl 



wai t i ng _f or _wr i t e _ok _yo wrie r _ypl 



wf wv_no_ypl _nr 



wai ti ng_f or_wri t e_val ue_nowner _ypl jiread 
wai ti ng_f or_wri t e_val ue_no wrier _ypl .yread 



wf wv_no_ypl _yr 



wf t _no_npl _nr 



wai ti ng_f or_t as _no wrier _npl jiread 



wf t _no_npl _yr 



wai ti ng_f or_t as _no wrier _npl _yread 



wf t _yo_npl 



wai ti ng_f or_t as _yo wrier _npl 



wf t _no_ypl _nr 



wai ti ng_f or_t as _no wrier _ypl jiread 



wf t _no_ypl _yr 



wai ti ng_f or_t as _no wrier _ypl _yread 



wf t o_yo_npl 



wai ti ng_f or_t as _ok_yo wrier _npl 



wf t o_yo_ypl 



wai t i ng_f or _t as _ok_yo wrier _ypl 



wf t v_no_ypl _nr 



wai t i ng_f or _t as _val ue jiowrier _ypl _nr e ad 
wai t i ng_f or _t as _val ue jiowrier _ypl _yr e ad 



wf t v_no_ypl _yr 



Tkh\ e G 1 : The abbr e vi at i ons for states usedinthe leaf tr ans i t i on t abl e . 



Synbol 


Expans i on 


dr 


read request 


r 


read 


rd 


read_data 


dw 


wr i t e re que s 


1 


1 ock 


SWD 


s_wri te.owri 


WD 


wr i t e _ok 


dt 


tas request 


rt 


read_t as 


tf 


tas_f ai 1 ed 



Tkbl e G 2: The abbreviations for input nes sages and requests usedinthe leaf transition 
t abl e . 



120 



APPENDI X C. TABLE OF PROTOCOL BEHAVI OR 



Synbol 



Expansi on 



Stay in the sane state. 



NUMBER 



Gbto the state nunbered NUMBER. 



D 



Put ne ss age i n del ayi ng queue. 



X 



Bror. 



DX 



Xif \\e issue onl y one request at a ti ne, Dothe:*™ se 



z : ACTI ON 



If val ue of the bl ock is zero then ACTI ON. 



nr q: ACTI ON 



If current node ori gi nat ed the request then ACTION 



?. 



ACTI ON 



Use ACTI ON. 



Tkh\ e G 3 : The abbr e vi at i ons f or s ynbol s usedinthe leaf tr ans i t i on t abl e . 



Synbol 


Expans i on 


fr 


Send fief r up to parent . 


rf r 


Send rficf r up t o parent . 


fw 


Send fief w up to parent . 


a 


Send a up to parent . 


al 


Send al up t o parent . 


ft 


Send fief t up to parent. 


rft 


Send rfief t up to parent . 


cv 


Send ev up t o parent . 


rd 


Send rd to the reader. 


SW) 


Send sw) the wri ter. 


tf 


Send tf to the tas reques 



er. 



Tkh\ e G 4: The abbrevi ati ons for output nes sages used i n the 1 eaf transi ti on t abl e. 
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dr 


r 


rd 


d^w 


1 


S^HD 


W) 


dt 


rt 


tf 





i nval i d 


fr 
3 


rf r 


X 


fw 
5 


irrq: X 
!:a. 


X 


X 


ft 

14 


rft 


X 


1 


r_yo_npl 




rd 


X 


fw 

7 


S^HD a 




X 


X 


ft 
16 


z: f w. 
!:tf 


X 


2 


r_rio_npl 




rd 


X 


fw 
6 


a 



X 


X 


ft 
15 


z: f w. 
!:tf 


X 


3 


wf r_rio_npl 


DX 


rf r 


cv 
2 


DX 


D 


X 


X 


X 


rft 


X 


4 


w_yo_npl 




rd 
1 


X 




S^HD a 




X 


X 


• 


z: f w. 
!:tf 


X 


5 


wf wjiojipl _rir 


DX 


rf r 


X 


DX 


irrq: al 

8 

!:a. 


10 


X 


DX 


rft 


X 


6 


wf w_rio_npl _yr 


DX 


rd 


X 


DX 


irrq: al 

9 

!:a5 


10 


X 


DX 


rft 


X 


7 


■wf w_yojipl 


DX 


rd 


X 


DX 


irrq: sw) al 
9 
! : s^HD a 5 


10 


X 


DX 


rft 


X 


8 


wf w_rio_ypl _nr 


DX 


rf r 


X 


DX 


nrq: X 
!:D. 


11 


12 


DX 


rft 


X 


9 


■wf w_rio_ypl _yr 


DX 


rd 


X 


DX 


nrq: X 
!:D. 


11 


13 


DX 


rft 


X 


10 


wf w)_yo_npl 


DX 


rd 


X 


DX 


nrq: al 
11 
!:X 


X 


X 


DX 


rft 


X 


11 


■wf -HD-yo-ypl 


DX 


rd 


X 


DX 


nrq: X 
!:D. 


X 


4 


DX 


rft 


X 


12 


■wf wvjio.ypl _nr 


DX 


rf r 


X 


DX 


nrq: X 
!:D. 


4 


X 


DX 


rft 


X 


13 


■wf ■wvjio.ypl _yr 


DX 


rd 


X 


DX 


nrq: X 
!:D. 


4 


X 


DX 


rft 


X 
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dr 


r 


rd 


dw 


1 


SW) 


W) 


dt 


rt 


tf 


14 


wf t _rio_npl _rir 


DX 


rf r 


X 


DX 


nrq: al 

17 

!:a. 


19 


X 


DX 


rft 





15 


wf t Jiojipl _yr 


DX 


rd 


X 


DX 


nrq: al 

18 
! :a 14 


19 


X 


DX 


rft 





16 


wf t _yo_npl 


DX 


rd 


X 


DX 


nrq: sw) al 

18 
! : sw) a 14 


19 


X 


DX 


rft 





17 


wf t _rio ypl nr 


DX 


rf r 


X 


DX 


nrq: X 
!:D. 


20 


21 


DX 


rft 





18 


wf t _rio_ypl _yr 


DX 


rd 


X 


DX 


nrq: X 
!:D. 


20 


22 


DX 


rft 





19 


wf to_yojipl 


DX 


rd 


X 


DX 


nrq: al 
20 
!:X 


X 


X 


DX 


rft 


X 


20 


■wf to_yo_ypl 


DX 


rd 


X 


DX 


nrq: X 
!:D. 


X 


4 


DX 


rft 


X 


21 


wf t v_rio_ypl _nr 


DX 


rf r 


X 


DX 


nrq: X 
!:D. 


4 


X 


DX 


rft 


X 


22 


wf t v_rio_ypl _yr 


DX 


rd 


X 


DX 


nrq: X 
!:D. 


4 


X 


DX 


rft 


X 
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G2 I^rent N)de Ti-ansition Tkble 

The state of a parent (non- 1 eaf ) node i ncl udes the f ul 1 vector descri bi ng i t s chi 1 d subtrees 
For thi s reason, the transi ti on t abl e mist be col 1 apsed (j ust ei ght subtrees i ncr eases the tot a 
nunber of states by a fact or \)f) 2 n order to express i t i n a reasonabi e anount of room 
The t abl e therefore takes three i nput s : the current state ( vhi ch does not i ncl udes the state 
of t he s ubt r e e vector), t he s ubt r e e vector c onbi nat i on, and t he i nput nessage. Inres pons e 
toanEssage, a node nay send a nessage, perf orman acti on, change i t s own state, or any 
conbi nati on of the above. M\ of these responses nay be nodi fie d by a condi ti onal express! on 
further specifying the state of the node. The table additionally contains assertions about 
the state of a node for sons of the entri es . These asserti ons are not re qui red to i npl enent 
the protocol, but are useful to understand \\hat mist be happening \\hen a node reaches a 
parti cul ar state. 

The list of states is enuiiErat ed i n Tkbl e 2.3. The possible vector conbinations are 
listed in G6. The ihbs sages that a parent node ni ght receive are listed in Tkh\ e G 7; an 
expl anati on of these ihbs sages i s i n Tkh\ e 2. 4. The list of acti ons a node nay perf or mi s 
enuiiErat edi n Tkh\ e G 8. Tkh\ e GQlists the ihbs sages that ni ght be sent by a parent node. 
Tkh\ e G 10 expl ai ns al 1 of the asserti ons and predi cat es used i n the transi ti on t abl e. The 
actual node transi ti on t abl e spans mil ti pi e pages , and is referred to as Tkh\ e G 11. 



Synbol 


Expans i on 


c 


vO_wO_cX 


All subtrees are either confirnBd or invalid. 


V 


vXLwO_cO 


M\ subtrees are either valid or invalid. 


vw 


vXLwXLcO 


M\ subtrees are valid, waiting, or invalid. 


vc 


vXLwO_cX 


M\ subtrees are valid, confirned ot invalid. 


vwc 


vX_wX_c X 


All subtrees are valid, waiting, confirned ot ir 



id. 



Tkhle G 6: The abbreviations for the vector conbinations used in the parent transition 
t abl e . 
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Synbol 


Expans i on 




ficfr 


find J o\\e s t _c onnon J" or _r e ad 




rficf r 


redi rectedJindJ o\\est _conraDnJ"or jroa 


r 


read 




ficfw 


findJo\\est connDnJ"or write 




1 


1 ock 




a 


ack 




al 


ackl 




ta 


throw ng^'way 




cte 


change _to_excl usi ve 




ficft 


findJo\\est connonjfor tas 




rficft 


redi rect edJindJ o\\est _conn©nJ"or _t as 


cv 


c onfir mval ue 




rd 


read_data 




uncv 


unc onfir mval ue 




rt 


read_tas 




W) 


wr i t e _ok 





Tkh\ e G 7: The abbrevi ati ons f or i nput he s sages used in the parent transi ti on t abl e. 



Synbol 


Expans i on 


L 


Lock thi s node and change the wri t er in 


-^ 


Qiange the sending subtree to valid. 


^ 


Qiange the sending subtree to confirnEd 


^ 


Qiange the sending subtree to invalid. 


^ 


Qiange the sending subtree to waiting. 


^ 


Qiange t hi s node ' s s t at us t o excl us i ve 


^ 


Qiange this node's status to shared. 



Tkh\ e G 8: The abbrevi ati ons for acti ons used in the parent transi ti on t abl e. 
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Synbol 



ficfr 



rficf r 



ficfw 



1 



al 



t a 



ct e 



ficft 



rficft 



cv 



uncv 



rt 



W) 



rdav, 



r davw, c 



r daw, c 



r davL, c 



r dav\\i/, c 



rl 



la 



Expans i on 



Send fief r up to parent, 



Send rficf r up t o parent , 



Send r down to randoniy chosen confirned subtree. 



Send fief w up to parent , 



Send 1 down to wri ti ng subtree. 



Send a up to parent , 



Send al up t o parent , 



Send t a up to parent , 



Send ct e down to onl y non- i nval i d subtree. 



Send fief t up to parent, 



Send rficf t up to parent, 



Send cv up to parent. 



Send uncv up to parent , 



Send rt down to randoni y chosen confirnEd subtree. 



Send wo down to onl y non- i nval i d subtree. 



Send rd down to al 1 valid, nalii ng theme onfirnEd. 



Send rd down to al 1 val i d or \\aiting, nalii ng theme onfirn;d 



Send rd down to al 1 \\aiting, nalii ng themconfirnEd. 



Level 1: Send rd down to al 1 val id except ioclier, 

nalii ng themconfirnEd. 
Level ^: Send rd down to al 1 val i d i nci udi ng 1 oclier, 

nalii ng al 1 but ioclier confirnEd. 



Level 1: Send rd down t o al 1 val i d or \\ai ti ng except 1 oc 

nalii ng themconfirnEd. 
Level ^: Send rd down to al 1 val i d or \\ai ti ng i nci udi ng 

nalii ng al 1 but ioclier confirnEd. 



Send r down to the 1 oclii ng subtree. 



Send 1 down t o al 1 not- i nval i d subtrees and the wri ti ng subtree 



cer. 



1 oclier, 



Tkh\ e G 9: The abbrevi ati ons for output he s sages used in the parent transi ti on t abl e. 
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Synbol 


Expans i on 


rC 


Sendi ng subtree 


IS not t he onl y c onfir ne d s ubt r e e . 


rV 


Sendi ng subtree 


IS not t he onl y val i d s ubt r e e . 




iW 


Sendi ng subtree 


L s not -wai ting. 




lL 


Sendi ng subtree 


IS not the locker of this node. 




L i 


Sendi ng subtree 


L s i nval i d. 




L VWC 


Sendi ng subtree 


IS valid, \\aiting, or confirned. 




L VW 


Sendi ng subtree 


L s val i d or wai t i ng. 




L VC 


Sendi ng subtree 


IS val i d or confirned. 




L C 


Sendi ng subtree 


L s confirned. 




L V 


Sendi ng subtree 


L s val i d. 




ll 


Sendi ng subtree 


IS the locker of this node. 


SVC 


Subtree of requester if operationis val i d or confirnEC 


S V 


Subtree of requester of operationis valid. 


sc 


Subtree of requester of operationis confirned. 


Iv 


Locker's subtree is valid. 


Ic 


Locker's subtree is confirned. 


IW 


Locker's subtree is not waiting. 


Oc 


Zero subtrees are confirned. 


OC 


At least one subtree is confirned. 


Ov 


Zero subtrees are val i d. 


OV 


At least one subtree is valid. 


T 


Thi s node i s not the top 1 evel node i n the hi erarchy. 


t 


Thi s node is the top 1 evel i n the hi erarchy. 


1 


Thi s node i s at 1 evel one. 


ll 


Thi s node i s above 1 evel one. 


Ivwc 


Exactl y one subtree i s val i d, wai ti ng, or confirned. 


Ivw 


Exactl y one subtree i s val i d or \\ai ti ng. 


Ivc 


Exac 1 1 y one s ubt r e e i s val i d or c onfir ne d. 


Iv 


Exac 1 1 y one s ubt r e e i s val i d. 


Ic 


Exactly one subtree is confirned. 


CKP 


This node is on the request path of the current operat 


NRP 


Thi s node i s not on the request path of the current ope 



on. 
rati on. 
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FLCFR 


RFLCFR 


R 


FLCFW 


INVALID 


I NVALI DO 


I NVALI D 


I NVALI D 


if(t){ 


assert T, 


assert T, 


if(t){ 


al 1 oc; 


assert NRP; 


send rflcf r; 


all oc; 


next .state 2; 


send rflcf r ; 


next .state . ; 


next .state 2; 


}else{ 


next .state . ; 




} el se { 


do +v; 






send flcf w; 


send flcf r; 






next .state . ; 


next .state 8; 
} 






} 


C S_U_NOP_NGA 1 


C S.UJSrOP.NGAl 


C S.U.NOPJSrGA 1 


C S.UJSrOP.NGA 1 


assert rQ 


if (ORP){ 


send r; 


assert T, 


do -fv; 


assert sc; 


next .state . ; 


send Aciw, 


send r; 


} 




ne xt .s t at e . ; 


next .state 20; 


s e nd r ; 

next .state . ; 






C E_U_NOP_NGA 2 


C E.UJSrOP.NGA2 


C E.U.NOPJSrGA 2 


c E.UJSrOP.NGA 2 


assert rQ 


if (ORP){ 


do 4s; 


do L; 


do -fv; 


assert sc; 


send r; 


send 1 a; 


s e nd r ; 


} 


next .state 1; 


if(ii){ 


next .state 21; 


send r; 




do -fv; 




next .state . ; 




next .state 24; 
} el se { 

next .state 5; 
} 


C S.L.NOP.NGA 3 


c s J.Jsrop.NGA3 


C S.L.NOPJSrGA 3 


c s j.Jsrop.NGA 3 


if(ii){ 


if (ORP){ 


send r; 


send Et 


send Ej 


assert sc; 


next .state . ; 


next .state . ; 


next .state . ; 


} 






} el se { 


send r; 






assert rQ 


next .state . ; 






do -fv; 








send r; 








next .state 22; 
} 








C S.L.YOP.NGA 4 


C SJ..YOP.NGA4 


C S.L.YOPJSrGA 4 


C S J..YOP.NGA 4 


if(ii){ 


if (ORP){ 


send r; 


send Et 


send Ej 


assert sc; 


next .state . ; 


next .state . ; 


next .state . ; 


} 






}else{ 


s e nd r ; 






do -fv; 


next .state . ; 






send r; 








next .state 23; 








} 









Tkh\ e G 11: The transi ti on t abl e f or parent nodes . 
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FLCFR 


RFLCFR 


R 


FLCFW 


C E_L_YOP_NGA 5 


C E.L.YOPJSrGA5 


C E.L.YOPJSrGA 5 


C E.L.YOP.NGA 5 


if(ii){ 


if (ORP){ 


if (nrp){ 


send Ej 


send Ej 


assert sc; 


send Ej 


next .state . ; 


next .state . ; 


} 


next .state . ; 




}else{ 


s e nd r ; 


}else { 




do -K; 


next .state . ; 


send r ; 




send r; 




next .state . ; 




next .state 24; 
} 




} 




C S J._YOP_YGA 6 


C S.L.YOP.YGA6 


C S.L.YOP.YGA 6 


C S J..YOP.YGA 6 


if(ii){ 


if (ORP){ 


send r; 


send Ej 


send Ej 


assert sc; 


next .state . ; 


next .state . ; 


next .state . ; 


} 






}else{ 


s e nd r ; 






do -K; 


next .state . ; 






send r; 








next .state 25; 
} 








C EJ..YOP.YGA 7 


C E.L.YOP.YGA7 


C E.L.YOP.YGA 7 


C E.L.YOP.YGA 7 


if(ii){ 


if (ORP){ 


if (nrp){ 


send Ej 


send Ej 


assert sc; 


send Ej 


next .state . ; 


next .state . ; 


} 


next .state . ; 




}else{ 


s e nd r ; 


}else { 




do -K; 


next .state . ; 


send r ; 




send r; 




next .state . ; 




next .state 26; 
} 




} 




V S.U.NOP.NGA 8 


V S.UJSrOP.NGA8 


V S.UJSrOP.NGA 8 


V S.UJSrOPJSTGA 8 


as s e r t r V; 


assert T, 


assert T, 


assert T, 


do -iw, 


if(ORP){ 


send rflcf r ; 


send flcf w; 


next .state 14; 


assert sv; 

} 

send rflcf r ; 


next .state . ; 


ne xt .s t at e . ; 










next .state . ; 






V S.L.NOP.NGA 9 


V s J.Jsrop.NGA9 


V s J.Jsrop.NGA 9 


V s j.JsropjsrGA 9 


if(ii){ 


assert T, 


assert T, 


send Ej 


send Ej 


if (ORP){ 


send rflcf r; 


next .state . ; 


next .state . ; 


assert sv; 


next .state . ; 




}else{ 


} 






assert i; 


send rflcf r ; 






send flcf r; 


next .state . ; 






next .state . ; 








} 
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FLCFR 


RFLCFR 


R 


FLCFW 


V S_L_YOP_NGA 10 


V S_L_YOP_NGAlO 


V S J..YOP.NGA 10 


V S.L.YOP.NGA 10 


if(ii){ 


if (ORP){ 


send rl ; 


send Et 


send Ej 


assert sv; 


next .state . ; 


next .state . ; 


next .state . ; 


} 






}else{ 


if(t){ 






if(t-il){ 


s e nd r 1 ; 






send rl ; 


} el se { 






next .state . ; 


send rflcf r ; 






} el se { 


} 






send flcf r ; 


next .state . ; 






next _st at e . ; 








}} 








V EJ._YOP_NGA 11 


V e_l_yop_nga11 


V EJ..YOP.NGA 11 


V E.L.YOP.NGA 11 


if(ii){ 


if(ORP){ 


if(NRP){ 


send Et 


send Et 


assert sv; 


send Et 


next .state . ; 


next .state . ; 


} 


next .state . ; 




}else{ 


s e nd r 1 ; 


}else { 




s e nd r 1 ; 


next .state . ; 


send rl ; 




next _st at e . ; 




next .state . ; 




} 




} 




V S J._YOP_YGA 12 


V S.L.YOP.YGA12 


V S J..YOP.YGA 12 


V S.L.YOP.YGA 12 


if(ii){ 


if(ORP){ 


send rl ; 


send Et 


send Et 


assert sv; 


next .state . ; 


next .state . ; 


next .state . ; 


} 






}else{ 


if(t){ 






if(t-H){ 


s e nd r 1 ; 






send rl ; 


}else{ 






next _st at e . ; 


send rflcf r; 






} el se { 


} 






send flcf r ; 


next .state . ; 






next _st at e . ; 








}} 








V E_L_YOP_YGA 13 


V E.L.YOP.YGA13 


V EJ..YOP.YGA 13 


V E.L.YOP.YGA 13 


if(ii){ 


if(ORP){ 


if(NRP){ 


send Et 


send Et 


assert sv; 


send Et 


next .state . ; 


next _st at e . ; 


} 


next .state . ; 




} el se { 


send rl ; 


}else{ 




s e nd r 1 ; 


next .state . ; 


send rl ; 




next .state . ; 




next .state . ; 




} 




} 




VWS_U_NOP_NGA 14 


VWS.U.NOP.NGA14 


VWS.U.NOP.NGA 14 


VWS.U.NOP.NGA 14 


assert rV; 


assert 15 


assert 15 


assert 15 


assert i W; 


if(ORP){ 


send rflcf r ; 


send flcf w; 


do -Hw; 


assert sv; 


next .state . ; 


next .state . ; 


next .state . ; 


} 

send rflcf r; 

next .state . ; 
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FLCFR 


RFLCFR 


R 


FLCFW 


VWS_L_NOP_NGA 15 


VWS.LJSrOP.NGAl 5 


VWS J..NOP.NGA 15 


VWS.LJSrOP.NGA 15 


if(ii){ 


assert T, 


assert T, 


send Et 


send Ej 


if (ORP){ 


send rflcf r; 


next .state . ; 


next .state . ; 


assert sv; 


next .state . ; 




}else{ 


} 






assert rV; 


send rflcf r ; 






do -Hw; 


next .state . ; 






next .state . ; 
} 








VWS J._YOP_NGA 16 


VWS.L.YOP.NGA16 


VWS J..YOP.NGA 16 


VWS.L.YOP.NGA 16 


if(ii){ 


if (ORP){ 


send rl ; 


send Et 


send Ej 


assert sv; 


next .state . ; 


next .state . ; 


next .state . ; 


} 






}else{ 


if(t){ 






assert i Y( 


s e nd r 1 ; 






do -Hw; 


} el se { 






next .state . ; 


send rflcf r ; 






} 


} 

next .state . ; 






VWEJ..YOP.NGA 17 


VWE.L.YOP.NGA17 


VWEJ..YOP.NGA 17 


VWE.L.YOP.NGA 17 


if(ii){ 


if (ORP){ 


if (nrp){ 


send Et 


send Et 


assert sv; 


send Et 


next .state . ; 


next .state . ; 


} 


next .state . ; 




}else{ 


s e nd r 1 ; 


}else { 




assert i Y( 


next .state . ; 


send rl ; 




do -Hw; 




next .state . ; 




next .state . ; 
} 




} 




VWS J..YOP.YGA 18 


VWS.L.YOP.YGA18 


VWS J..YOP.YGA 18 


VWS.L.YOP.YGA 18 


if(ii){ 


if(ORP){ 


send rl ; 


send Et 


send Et 


assert sv; 


next .state . ; 


next .state . ; 


next .state . ; 


} 






}else{ 


if(t){ 






assert i Y( 


s e nd r 1 ; 






do -Hw; 


} el se { 






next .state . ; 


send rflcf r; 






} 


} 

next .state . ; 






VWEJ..YOP.YGA 19 


VWE.L.YOP.YGA19 


VWEJ..YOP.YGA 19 


VWE.L.YOP.YGA 19 


if(ii){ 


if(ORP){ 


if(NRP){ 


send Et 


send Et 


assert sv; 


send Et 


next .state . ; 


next .state . ; 


} 


next .state . ; 




}else{ 


s e nd r 1 ; 


}else { 




assert i Y( 


next .state . ; 


send rl ; 




do -\w, 




next .state . ; 




next .state . ; 




} 




} 
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FLCFR 


RFLCFR 


R 


FLCFW 


VC S_U_NOP_NGA 20 


VC s_u_nop_nga20 


VC s.ujsropjSTGA 20 


VC s.ujsrop.NGA 20 


as s e r t r V; 


if (ORP){ 


send r; 


assert T, 


do -\w, 


assert svc; 


next .state . ; 


send Aciw, 


next .state 27; 


} 

s e nd r ; 

next _st at e . ; 




next .state . ; 


VC E_U_NOP_NGA 21 


VC e_u_nop_nga21 


VC E.UJSrOPJSTGA 21 


VC E.UJSrOPJSTGA 21 


as s e r t r V; 


if (ORP){ 


do 4s; 


do L; 


do -\w, 


assert svc; 


send r; 


if(ii){ 


next _st ate 28; 


} 


next .state 20; 


do -K; 




send r; 




} 




next .state . ; 




send la; 

next .state 24; 


VC S_L_NOP_NGA 22 


VC s j._nop_nga22 


VC s.LJsrop.NGA 22 


VC s j.Jsrop.NGA 22 


if(ii){ 


if(ORP){ 


s e nd r ; 


send Ej 


send Ej 


assert svc; 


next .state . ; 


next .state . ; 


next _st at e . ; 


} 






} el se { 


send r; 






assert rQ 


next .state . ; 






do -fv; 








s e nd r ; 








next .state . ; 
} 








VC S_L_YOP_NGA 23 


VC s j._yop_nga23 


VC S.L.YOP.NGA 23 


VC S J..YOP.NGA 23 


if(ii){ 


if (ORP){ 


send r; 


send Et 


send Ej 


assert svc; 


next .state . ; 


next .state . ; 


next _st at e . ; 


} 






} el se { 


send r; 






do -fv; 


next .state . ; 






send r ; 








next .state . ; 
} 








VC E_L_YOP_NGA 24 


VC e_l_yop_nga24 


VC E.L.YOPJSrGA 24 


VC E.L.YOP.NGA 24 


if(ii){ 


if (ORP){ 


if (nrp){ 


send Et 


send Ej 


assert svc; 


send Ej 


next .state . ; 


next _st at e . ; 


} 


next .state . ; 




} el se { 


send r; 


}else{ 




do -fv; 


next .state . ; 


send r; 




send r ; 




next .state . ; 




next .state . ; 




} 




} 
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FLCFR 


RFLCFR 


R 


FLCFW 


VC S_L_YOP_YGA 25 


VC S J..YOP.YGA25 


VC S.L.YOP.YGA 25 


VC S J..YOP.YGA 25 


if(ii){ 


if (ORP){ 


send r; 


send E) 


send Ej 


assert svc; 


next .state . ; 


next .state . ; 


next .state . ; 


} 






}else{ 


s e nd r ; 






do -K; 


next .state . ; 






send r; 








next .state . ; 
} 








VC E_L_YOP_YGA 26 


VC E.L.YOP.YGA26 


VC E.L.YOP.YGA 26 


VC EJ..YOP.YGA 26 


if(ii){ 


if (ORP){ 


if (nrp){ 


send E) 


send Ej 


assert svc; 


send Ej 


next .state . ; 


next .state . ; 


} 


next .state . ; 




}else{ 


s e nd r ; 


}else { 




do -K; 


next .state . ; 


send r ; 




send r; 




next .state . ; 




next .state . ; 
} 




} 




V-VK. S.U.NOP.NGA 27 


vw; S.U.NOP.NGA27 


vw; s.uJsrop.NGA 27 


vw; s.uJsrop.NGA 27 


assert rV; 


if (ORP){ 


send r; 


assert T, 


assert i AY 


assert svc; 


next .state . ; 


send flcf w; 


do -Hw; 


} 




next .state . ; 


next .state . ; 


s e nd r ; 

next .state . ; 






V-VK. E.U.NOP.NGA 28 


vw; E.uJsropjsrGA28 


vw; E.uJsropjSTGA 28 


vw; E.uJsropjSTGA 28 


assert rV; 


if (ORP){ 


do 4s; 


doL; 


assert i AY 


assert svc; 


send r; 


if(ii){ 


do -\w, 


} 


next .state 27; 


do -K; 


next .state . ; 


send r; 




} 




next .state . ; 




send 1 a; 

next .state 31; 


vw; s.LJsrop.NGA 29 


V-VK. S J..NOP.NGA29 


v-vK. s.LJsrop.NGA 29 


v-vK. s j.Jsrop.NGA 29 


if(ii){ 


if(ORP){ 


s e nd r ; 


send Ej 


send Ej 


assert svc; 


next .state . ; 


next .state . ; 


next .state . ; 


} 






} el se { 


send r; 






assert rQ 


next .state . ; 






do -fv; 








send r; 








next .state . ; 








} 
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FLCFR 


RFLCFR 


R 


FLCFW 


V-VK. S_L_YOP_NGA 


30 


VW; S.L.YOP.NGA30 


vw; S.L.YOP. 


_NGA 30 


vw; SJ..YOP 


_NGA 30 


if(ii){ 




if (ORP){ 


send r; 




send Et 




send Ej 




assert svc; 


next .state 


) 


next .state 


• ) 


next .state . ; 




} 










}else{ 




s e nd r ; 










do -K; 




next .state . ; 










send r; 














next .state . ; 
} 














V-VK. E_L_YOP_NGA 


31 


vw; E.L.YOPJsrGA31 


WK. E.L.YOP. 


_NGA 31 


vw; EJ..YOP 


_NGA 31 


if(ii){ 




if (ORP){ 


if (nrp){ 




send Et 




send Ej 




assert svc; 


send Ej 




next .state 


• ) 


next .state . ; 




} 


next .state 


• ) 






}else{ 




s e nd r ; 


}else { 








do -K; 




next .state . ; 


send r ; 








send r; 






next .state 


• ) 






next .state . ; 
} 






} 








V-VK. S.L.YOP.YGA 


32 


vw; S.L.YOP.YGA32 


vw; S.L.YOP. 


_YGA 32 


vw; SJ..YOP 


_YGA 32 


if(ii){ 




if (ORP){ 


send r; 




send Et 




send Ej 




assert svc; 


next .state 


) 


next .state 


• ) 


next .state . ; 




} 










}else{ 




s e nd r ; 










do -K; 




next .state . ; 










send r; 














next .state . ; 
} 














V-VK. E.L.YOP.YGA 


33 


vw; E.L.YOP.YGA33 


WK. E.L.YOP. 


_YGA 33 


vw; EJ..YOP 


_YGA 33 


if(ii){ 




if (ORP){ 


if (nrp){ 




send Et 




send Ej 




assert svc; 


send Ej 




next .st at e 


• ) 


next .state . ; 




} 


next .state 


• ) 






}else{ 




s e nd r ; 


}else { 








do -K; 




next .state . ; 


send r ; 








send r; 






next .state 


• ) 






next .state . ; 






} 








} 
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L 


A 


Al 


TA 


I NVALI D 


I NVALI D 


I NVALI D 


I NVALI D 


assert Ij 


next .state X; 


next .state X; 


next .state ^ 


if(ORP){ 

do L; 








send 1 ; 








next .state 4; 








}else{ 
send a; 








next _st at e . ; 








} 








C S_U_NOP_NGA 1 


C S_U_NOP_NGA 1 


C S_U_NOP_NGA 1 


C S.U.NOP.NGA 1 


do L; 


next .state X; 


next .state X; 


assert i c; 


if (nrp){ 






do -ti ; 


next .state 3; 






if(Oc){ 


}else{ 






assert 15 


if(ii){ 






send t a; 


do -tv; 






next .state 0; 


next _st at e 23; 






}else{ 


} el se { 






next .state . ; 


next _st at e 4; 






} 


}} 
send 1 a; 








C E_U_NOP_NGA 2 


C E_U_NOP_NGA 2 


C E_U_NOP_NGA 2 


C E.U.NOP.NGA 2 


doL; 


next _st ate X; 


next .state X; 


assert i c; 


if (nrp){ 






do -ti ; 


next .state 3; 






assert OQ 


} el se { 
if(ii){ 






if(>l&lvw;){ 
send cte; 


do -tv; 






} 


next _st at e 23; 






next .state . ; 


} el se { 
next .state 4; 








}} 
send 1 a; 








C S_L_NOP_NGA 3 


C S J._NOP_NGA 3 


C S.L.NOP.NGA 3 


C S J..NOP.NGA 3 


send Ej 


assert ic; 


next .state X; 


assert i c; 


next _st at e . ; 


do -ti; 




do -tv; 




if(Oc){ 
assert T, 




if(Oc){ 
assert T, 




send a; 




send uncv; 




next .state 0; 




next .state 9; 




} el se { 
next _st at e . ; 




} el se { 
next .state 22; 




} 




} 
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L 


A 


Al 


TA 


C S_L_YOP_NGA 4 


C S J._YOP_NGA 4 


C S_L_YOP_NGA 4 


C S.L.YOPJSrGA 4 


send Ej 


assert i c; 


assert i c; 


assert 1 Y( 


next _st at e . ; 


do -ti; 


do -tc; 


assert ic; 




assert OQ 


if(lc){ 


do -tv; 




next _stat e . ; 


assert 15 


if(Oc){ 






send al; 


if(1){ 






next _st ate 3; 


send uncv; 






} el se { 


} 






next .state 6; 


next .state 10; 






} 


}else{ 
next .state 23; 

} 


C E_L_YOP_NGA 5 


C E_L_YOP_NGA 5 


C E_L_YOPJSrGA 5 


C E.L.YOPJSrGA 5 


send E) 


assert i c; 


assert i c; 


assert 1 AY 


next .state . ; 


do -ti ; 


do -tc; 


assert i c; 




assert OQ 


if(lc){ 


do -tv; 




next _stat e . ; 


send wd; 


if(Oc){ 






next .state 2; 


next .state 11; 






}else{ 


}else{ 






next .state 7; 


next .state 24; 






} 


} 


C S J._YOP_YGA 6 


C S_L_YOP_YGA 6 


C S_L_YOP_YGA 6 


C S.L.YOP.YGA 6 


send E) 


assert i c; 


next .state X; 


assert 1 AY 


next .state . ; 


do -ti ; 




assert i c; 




if(lc){ 




do -tv; 




assert 15 




if(Oc){ 




send al; 




if(1){ 




next .state 3; 




send uncv; 




}else{ 




} 




next .state . ; 




next .state 12; 




} 




}else{ 
next .state 25; 

} 


C EJ._YOP_YGA 7 


C E_L_YOP_YGA 7 


C E.L.YOP.YGA 7 


C E.L.YOP.YGA 7 


send E) 


assert i c; 


next .state X; 


assert 1 AY 


next _st at e . ; 


do -ti; 




assert ic; 




if(lc){ 




do -tv; 




send wd; 




if(Oc){ 




next _st ate 2; 




next J tat e 13; 




}else{ 




}else{ 




next _st at e . ; 




next J tat e 26; 




} 




} 
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L 


A 


Al 


TA 


V S_U_NOP_NGA 8 

do L; 

if(NRP){ 

send 1 a; 
next .state 9; 
}else{ 
if(ii){ 

do -fv; 

} 

send la; 
next _st ate 10; 
} 


V S_U_NOPJSrGA 8 

next .state X; 


V S.U.NOP.NGA 8 

next .state X; 


V S.UJSrOPJSTGA 8 

next .state ^ 


V S_L_NOP_NGA 9 

send Ej 

next .state . ; 


V S.L.NOPJSrGA 9 

assert i v; 
do -ti ; 
if(Ov){ 

assert 15 

send a; 

next .state 0; 
}else{ 

next .state . ; 

} 


V s J..NOPJsrGA 9 
next .state X; 


V s j.Jsrop.NGA 9 
next .state X, 


V S J._YOP_NGA 10 

send Ej 

next .state . ; 


V S.L.YOPJSrGA 10 

assert i v; 
do -ti ; 
assert OV; 
next .state . ; 


V S J..YOPJSrGA 10 
assert i v; 
do -tc; 
if(lvc){ 

assert 15 

send al; 

next .state 3; 
}else{ 

next .state 25; 
} 


V S J..YOP.NGA 10 
next .state X, 


V EJ._YOP_NGA 11 

send Ej 

next .state . ; 


V E.L.YOPJSrGA 11 
assert i v; 
do -ti ; 
assert OV; 
next .state . ; 


V EJ..YOP.NGA 11 

assert i v; 
do -tc; 
if(lvc){ 

send wd; 

next .state 2; 
}else{ 

next .state 26; 

} 


V EJ..YOPJSrGA 11 
next .state X, 
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L 


A 


Al 


TA 


V S_L_YOP_YGA 12 


V S_L_YOP_YGA 12 


V S J..YOP.YGA 12 


V S J..YOP.YGA 12 


send Ej 


assert i v; 


next .state X; 


next .state ^ 


next _st at e . ; 


do -ti; 
if(lv){ 

as s e r t 15 

assert 1 v; 

do -tc; 

send al; 

next .state 3; 
}else{ 

next .state . ; 

} 






V E_L_YOP_YGA 13 


V E.L.YOP.YGA 13 


V EJ..YOP.YGA 13 


V EJ..YOP.YGA 13 


send Ej 


assert i v; 


next .state X; 


next .state X, 


next .state . ; 


do -ti ; 
if(lv){ 

assert 1 v; 

do -tc; 

send wd; 

next .state 2; 
}else{ 

next .state . ; 
} 






VWS_U_NOP_NGA 14 


VWS.U.NOP.NGA 14 


vws.UJsrop.NGA 14 


vws.uJsropjSTGA 14 


do L; 


next .state X; 


next .state X; 


next .state X, 


if (nrp){ 








send la; 








next _st ate 15; 








} el se { 








if(ii){ 








do -tv; 








} 








send 1 a; 








next .state 16; 








} 








VWS J._NOP_NGA 15 


VWS.L.NOP.NGA 15 


vws j.JsropjsrGA 15 


vws J.Jsrop.NGA 15 


send Ej 


assert i v; 


next .state X; 


next .state ^ 


next .state . ; 


do -ti; 
assert 0\^ 
next .state . ; 






VWS J._YOP_NGA 16 


VWS.L.YOP.NGA 16 


vws J..YOPJsrGA 16 


VWS J..YOP.NGA 16 


send Ej 


assert i v; 


assert i v; 


next .state 5; 


next .state . ; 


do -ti ; 


do -tc; 






assert 0\^ 


assert OV; 






next .state . ; 


next .state 32; 
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L 


A 


Al 


TA 


VWE_L_YOP_NGA 17 


VWE_L_YOP_NGA 17 


VWEJ._YOP_NGA 17 


VWEJ._YOPJSrGA 17 


send Ej 


assert i v; 


assert i v; 


next .state ^ 


next _st at e . ; 


do -ti; 


do -tc; 






assert 0\^ 


assert OV; 






next _st at e . ; 


next _st ate 33; 




VWS_L_YOP_YGA 18 


VWS_L_YOP_YGA 18 


VWS J._YOP_YGA 18 


VWS J..YOP.YGA 18 


send Ej 


assert i v; 


next .state X; 


next .state ^ 


next _st at e . ; 


do -ti; 
assert 0\^ 
next .state . ; 






VWE_L_YOP_YGA 19 


VWE_L_YOP_YGA 19 


VWEJ._YOP_YGA 19 


VWEJ..YOP.YGA 19 


send Ej 


assert i v; 


next _st ate X; 


next .state X, 


next .state . ; 


do -ti ; 
assert 0\^ 
next _st at e . ; 






VC S_U_NOP_NGA 20 


VC S_U_NOP_NGA 20 


VC s_ujsrop_NGA 20 


VC s.ujsropjSTGA 20 


do L; 


next .state X; 


next .state X; 


assert i c; 


if(NRP){ 






do ^i; 


next _st ate 22; 






if(Oc){ 


}else{ 






assert 15 


if(ii){ 






send uncv; 


do -fv; 






next .state 8; 


} 






} el se { 


next .state 23; 






next .state . ; 


} 






} 


send la; 








VC E_U_NOP_NGA 21 


VC E_UJSrOP_NGA 21 


VC E-UJSrOPJSTGA 21 


VC E.UJSrOP.NGA 21 


do L; 


next .state X; 


next .state X; 


assert i c; 


if (nrp){ 






do -ti ; 


next .state 22; 






assert OQ 


}else{ 






next .state . ; 


if(ii){ 








do -fv; 

} 

next _st ate 23; 

} 

send la; 





















C. 2 . PARENT NODE TRANS I TI ON TABLE 



139 



L 


A 


Al 


TA 


VC S_L_NOP_NGA 22 


VC s_LJsropjsrGA 22 


VC s j.Jsrop_NGA 22 


VC s j.Jsrop.NGA 22 


send Ej 


assert i vc; 


next .state X; 


assert i vc; 


next _st at e . ; 


do ^i; 




do -tv; 




if(Ov){ 




if(Oc){ 




if(Oc){ 




assert 15 




next _st ate X; 




send uncv; 




} el se { 




next .state 9; 




next _st ate 3; 




} el se { 




} 




next .state . ; 




}else{ 




} 




if(Oc){ 








send uncv; 








next .state 9; 








}else{ 








next _st at e . ; 








}} 






VC S_L_YOP_NGA 23 


VC S_L_YOP_NGA 23 


VC S J._YOP_NGA 23 


VC S J..YOP.NGA 23 


send Ej 


assert i vc; 


assert i vc; 


assert 1 \Y 


next _st at e . ; 


do ^i; 


do -fc; 


assert i vc; 




if(Ov){ 


if(Ov){ 


do -tv; 




if(Oc){ 


next .state 6; 


if(OcM){ 




next .state X; 


}else{ 


send uncv; 




} el se { 


next .state 25; 


} 




next _st at e 4; 


} 


if(Oc){ 




} 




next .state 10; 




}else { 




} el se { 




if(Oc){ 




next .state . ; 




next_state 10; 




} 




}else{ 








next .state . ; 








}} 






VC E_L_YOP_NGA 24 


VC E_L_YOPJSrGA 24 


VC E.L.YOP.NGA 24 


VC EJ..YOP.NGA 24 


send Et 


assert i vc; 


assert i vc; 


assert l^ 


next .state . ; 


do -ti ; 


do -fc; 


assert i vc; 




if(Ov){ 


if(lvc){ 


do -tv; 




if(Oc){ 


send wd; 


if(Oc){ 




next _st ate X; 


next .state 2; 


next .state 11; 




}else{ 


}else{ 


}else{ 




next .state 5; 


if(Ov){ 


next .state . ; 




} 


next J tat e 7; 


} 




}else{ 


}else{ 






if(Oc){ 


next J tat e 26; 






next .state 11; 


}} 






} el se { 








next _st at e . ; 








}} 
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L 


A 


Al 


TA 


VC S_L_YOP_YGA 25 

send Ej 

next _st at e . ; 


vc S_L_YOP_YGA 25 

assert i vc; 
do ^i; 
if(lvc){ 

assert 15 

if(lv){ 
do -fc; 

} 

send al; 

next .state 3; 

}else{ 

if(Ov){ 

next .state 6; 
}else{ 

if(Oc){ 

next _st ate 12; 

} el se { 

next _st at e . ; 

}}} 


vc S J._YOP_YGA 25 

next .state X; 


vc S J..YOP.YGA 25 
assert l'V( 
assert i vc; 
do -fv; 
if(OcM){ 
send uncv; 

} 
if(Oc){ 

next .state 12; 
}else{ 

next .state . ; 

} 


vc E_L_YOP_YGA 26 

send Ej 

next _st at e . ; 


vc E_L_YOP_YGA 26 

assert i vc; 
do ^i; 
if(lvc){ 
if(lv){ 

do -fc; 

} 

send wd; 
next _st ate 2; 
}else{ 
if(Ov){ 

next .state 7; 
}else{ 

if(Oc){ 

next _st ate 13; 

} el se { 

next _st at e . ; 

}}} 


vc E_L_YOP_YGA 26 
next .state X; 


vc EJ..YOP.YGA 26 

assert l'V( 
assert i vc; 
do -fv; 
if(Oc){ 

next .state 13; 
}else { 

next .state . ; 
} 


V-VK. S_U_NOP_NGA 27 

do L; 

if(NRP){ 

next .state 29; 
} el se { 

if(ii){ 
do -tv; 

} 

next _st ate 30; 

} 

send la; 


vw; s_u_NOP_NGA 27 
next .state X; 


vw; s_u_NOP_NGA 27 
next .state X; 


vw; S.U.NOP.NGA 27 
assert i c; 
do -ti; 
if(Oc){ 

assert 15 

send uncv; 

next .state 14; 
}else { 

next .state . ; 

} 
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L 


A 


Al 


TA 


V-VK. E_U_NOP_NGA 28 


vw; E_U_NOP_NGA 28 


VW; E.U.NOP.NGA 28 


vw; E.U.NOP.NGA 28 


do L; 


next .state X; 


next .state X; 


assert i c; 


if(NRP){ 






do -ti; 


next .state 29; 






assert OQ 


}else{ 






next .state . ; 


if(ii){ 








do -K; 

} 

next _st ate 30; 

} 

send la; 




















V-VK. S_L_NOP_NGA 29 


vw; s_LJsrop_NGA 29 


vw; s J..NOP.NGA 29 


vw; S J..NOP.NGA 29 


send Et 


assert i vc; 


next .state X; 


assert i vc; 


next .state . ; 


do -ti ; 




do -tv; 




assert OV; 




if(Oc){ 




if(Oc){ 




assert 15 




send uncv; 




send uncv; 




next .state 15; 




next .state 15; 




}else{ 




}else{ 




next .state . ; 




next .state . ; 




} 




} 


vw; S_L_YOP_NGA 30 


YWZ S.L.YOP.NGA 30 


V-VK. S J..YOP.NGA 30 


YWZ S J..YOP.NGA 30 


send Et 


assert i vc; 


assert i vc; 


assert 1 AY 


next .state . ; 


do -ti ; 


do -fc; 


assert i vc; 




assert OV; 


assert OV; 


do -tv; 




if(Oc){ 


next .state 32; 


if(OcM){ 




next .state 16; 




send uncv; 




}else{ 




} 




next .state . ; 




if(Oc){ 




} 




next .state 16; 
}else{ 
next .st at e . ; 

} 


vw; E_L_YOP_NGA 31 


V-VK. E.L.YOP.NGA 31 


V-VK. E.L.YOP.NGA 31 


V-VK. E J..YOP.NGA 31 


send Et 


assert i vc; 


assert i vc; 


assert 1 AY 


next _st at e . ; 


do ^i; 


do -fc; 


assert i vc; 




assert OV; 


assert OV; 


do -tv; 




if(Oc){ 


next .state 33; 


if(Oc){ 




next .state 17; 




next .state 17; 




}else{ 




}else{ 




next .state . ; 




next .st at e . ; 




} 




} 
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L 


A 


Al 


TA 


V-VK. S_L_YOP 


_YGA 32 


VW; S_L_YOP_YGA 32 


vw; s J._YOP_YGA 32 


vw; s J..YOP.YGA 32 


send Ej 




assert i vc; 


next .state X; 


assert l'V( 


next _st at e 


• ) 


do ^i; 
assert OV; 
if(Oc){ 

next _st ate 18; 
} el se { 

next _st at e . ; 
} 




assert i vc; 
do -tv; 
if(OcM){ 
send uncv; 

} 
if(Oc){ 

next .state 18; 
}else{ 

next .state . ; 

} 


vw; E_L_YOP 


_YGA 33 


V-VK. E_L_YOP_YGA 33 


V-VK. E_L_YOP_YGA 33 


YWZ EJ..YOP.YGA 33 


send Et 




assert i vc; 


next .state X; 


assert 1 AY 


next .state 


• ) 


do -ti ; 
assert OV; 
if(Oc){ 

next .state 19; 
}else{ 

next .state . ; 

} 




assert i vc; 
do -tv; 
if(Oc){ 

next .state 19; 
}else{ 

next .state . ; 

} 
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CTE 


FLCFT 


RFLCFT 


CV 


I NVALI D 


I NVALI D 


I NVALI DO 


I NVALI D 


next _st at e X, 


if(t){ 


assert T, 


next .state ^ 




all oc; 


send rflcf t ; 






next _stat e 2; 


next _st at e . ; 






} el se { 








send flcf t ; 








next _stat e . ; 
} 






C S_U_NOP_NGA 1 


C S_U_NOP_NGA 1 


C S_U_NOP_NGAl 


c S_UJSrOP_NGA 1 


do -\e; 


send rt ; 


send rt ; 


assert i c; 


next _st ate 2; 


next .state . ; 


next .state . ; 


next .state . ; 


C E_U_NOP_NGA 2 


c E_UJSrOP_NGA 2 


c E_ujsropjsrGA2 


c E_UJSrOP_NGA 2 


next .state ^ 


s e nd r t ; 


send rt ; 


as s e r t i c ; 




next _st at e . ; 


next _st at e . ; 


ne xt _s t at e . ; 


C S_L_NOP_NGA 3 


C S_L_NOP_NGA 3 


c s_l_nop_nga3 


c s J.Jsrop_NGA 3 


next _st at e X, 


send Ej 


send Ej 


assert i c; 




next .state . ; 


next .state . ; 


next .state . ; 


C S_L_YOP_NGA 4 


C S J._YOP_NGA 4 


c s_l_yop_nga4 


C S J._YOP_NGA 4 


next .state ^ 


send Ej 


send Ej 


as s e r t i c ; 




next _st at e . ; 


next _st at e . ; 


ne xt _s t at e . ; 


C E_L_YOP_NGA 5 


C E_L_YOP_NGA 5 


c e_l_yop_nga5 


C EJ._YOP_NGA 5 


next _st at e X, 


send Ej 


send Ej 


assert i c; 




next .state . ; 


next .state . ; 


next .state . ; 


C S_L_YOP_YGA 6 


C S J._YOP_YGA 6 


c s_l_yop_yga6 


C S J._YOP_YGA 6 


next .state ^ 


send Ej 


send Ej 


as s e r t i c ; 




next _st at e . ; 


next _st at e . ; 


ne xt _s t at e . ; 


C E_L_YOP_YGA 7 


C E_L_YOP_YGA 7 


c e_l_yop_yga7 


C EJ._YOP_YGA 7 


next .state ^ 


send Ej 


send Ej 


as s e r t i c ; 




next .state . ; 


next .state . ; 


next .state . ; 


V S_U_NOP_NGA 8 


V s_ujsropjsrGA 8 


V s_u_nop_nga8 


V S-UJSrOPJSTGA 8 


next _st at e X, 


assert T, 


assert T, 


assert T, 




send flcf t ; 


send rflcf t ; 


assert i v; 




next _st at e . ; 


next _st at e . ; 


do -t€; 
send cv; 
if(Ov){ 

next _st at e 1; 
}else{ 

next _st at e 20; 

} 


V S J._NOP_NGA 9 


V S_L_NOP_NGA 9 


V s_l_nop_nga9 


V S_L_NOP_NGA 9 


next .state 5; 


send Ej 


send Ej 


as s e r t 15 




next .state . ; 


next _st at e . ; 


assert i v; 
do -t€; 
send cv; 
if(Ov){ 

next .state 3; 
}else{ 

next _st at e 22; 

} 
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CTE 


FLCFT 


RFLCFT 


CV 


V S_L_YOP_NGA 10 

next _st ate X; 


V S_L_YOP_NGA 10 

send Ej 

next .state . ; 


V S J..YOP.NGA10 

send Ej 

next .state . ; 


V S.L.YOP.NGA 10 

assert i v; 
do -t€; 
if(1){ 
send cv; 

} 
if(Ov){ 

next .state 4; 
}else{ 

next .state 23; 

} 


V E_L_YOP_NGA 11 

next .state X; 


V E_L_YOP_NGA 11 

send Ej 

next .state . ; 


V EJ..YOP.NGA11 

send Ej 

next .state . ; 


V E.L.YOP.NGA 11 

assert i v; 
do -t€; 
if(Ov){ 

next .state 5; 
} el se { 

next .state 24; 

} 


V S J._YOP_YGA 12 
next _st ate X; 


V S.L.YOP.YGA 12 

send Ej 

next .state . ; 


V S J..YOP.YGA12 

send Ej 

next .state . ; 


V S.L.YOP.YGA 12 

assert i v; 
do -t€; 
if(1){ 
send cv; 

} 
if(Ov){ 

next .state 6; 
}else{ 

next .state 25; 

} 


V E_L_YOP_YGA 13 

next .state X; 


V E.L.YOP.YGA 13 

send Ej 

next .state . ; 


V EJ..YOP.YGA13 

send Ej 

next .state . ; 


V E.L.YOP.YGA 13 

assert i v; 
do -t€; 
if(Ov){ 

next .state 7; 
}else{ 

next .state 26; 

} 


VWS_U_NOP_NGA 14 
next .state X; 


VWS.U.NOP.NGA 14 
assert 15 
send flcf t ; 
next .state . ; 


VWS.U.NOP.NGA14 

assert 15 
send rflcf t ; 
next .state . ; 


VWS.U.NOP.NGA 14 

assert 15 
assert i v; 
do -t€; 

send r daw, c; 
send cv; 
if(Ov){ 

next .state 1; 
}else{ 

next .state 20; 

} 
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CTE 


FLCFT 


RFLCFT 


CV 


VWS_L_NOP_NGA 15 

next _st ate X; 


VWS_LJSrOP_NGA 15 

send Ej 

next .state . ; 


VWS J..NOP.NGA15 

send Ej 

next .state . ; 


vws.LJsrop.NGA 15 

assert T, 

assert i v; 

do -t€; 

send r daw, c; 

send cv; 

if(Ov){ 

next .state 3; 
}else{ 

next .state 22; 

} 


VWS_L_YOP_NGA 16 

next .state X; 


VWS.L.YOP.NGA 16 

send Ej 

next .state . ; 


VWS J..YOP.NGA16 

send Ej 

next .state . ; 


VWS.L.YOP.NGA 16 

assert i v; 
do -t€; 

if(1){ 
send cv; 

} 

send rdaw, c; 

if(Ov){ 

next .state 4; 
}else{ 

next .state 23; 
} 


VWEJ._YOP_NGA 17 
next .state X; 


VWE.L.YOP.NGA 17 

send Ej 

next .state . ; 


VWEJ..YOP.NGA17 

send Ej 

next .state . ; 


VWE.L.YOP.NGA 17 

assert i v; 
do -t€; 

send rdaw, c; 
if(Ov){ 

next .state 5; 
}else{ 

next .state 24; 
} 


VWS J._YOP_YGA 18 
next .state X; 


VWS.L.YOP.YGA 18 

send Ej 

next .state . ; 


VWS J..YOP.YGA18 

send Ej 

next .state . ; 


VWS.L.YOP.YGA 18 

assert i v; 
do -t€; 

if(1){ 
send cv; 

} 

send rdaw, c; 

if(Ov){ 

next .state 6; 
} el se { 

next .state 25; 

} 
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CTE 


FLCFT 


RFLCFT 


CV 


VWE_L_YOP_YGA 19 

next _st ate X; 


VWE_L_YOP_YGA 19 

send Ej 

next .state . ; 


VWEJ..YOP.YGA19 

send Ej 

next .state . ; 


VWE.L.YOP.YGA 19 

assert i v; 

do -t€; 

send r daw, c; 

if(Ov){ 

next .state 7; 
} el se { 

next .state 26; 
} 


vc S_U_NOP_NGA 20 
do -\e; 
next_state 21; 


vc S_U_NOP_NGA 20 
send rt ; 
next .state . ; 


vc S.U.NOP.NGA20 

send rt ; 
next .state . ; 


vc S.U.NOP.NGA 20 

assert i vc; 
do -t€; 
if(Ov){ 

next .state 1; 
} el se { 

next .state . ; 
} 


vc E_U_NOP_NGA 21 
next _st ate X; 


vc E_U_NOP_NGA 21 
send rt ; 
next .state . ; 


vc E.UJSrOP.NGA21 
send rt ; 
next .state . ; 


vc E.U.NOP.NGA 21 

assert i vc; 
do -t€; 
if(Ov){ 

next .state 2; 
} el se { 

next .state . ; 
} 


vc S_L_NOP_NGA 22 
next _st ate X; 


vc S.L.NOP.NGA 22 

send Ej 

next .state . ; 


vc S J..NOP.NGA22 

send Ej 

next .state . ; 


vc S.L.NOP.NGA 22 

assert i vc; 
do -t€; 
if(Ov){ 

next .state 3; 
} el se { 

next .state . ; 
} 


vc S_L_YOP_NGA 23 
next _st ate X; 


vc S.L.YOP.NGA 23 

send Ej 

next .state . ; 


vc S J..YOP.NGA23 

send Ej 

next .state . ; 


vc S.L.YOP.NGA 23 

assert i vc; 
do -t€; 
if(Ov){ 

next .state 4; 
} el se { 

next .state . ; 

} 


vc E_L_YOP_NGA 24 

next .state X; 


vc E.L.YOP.NGA 24 

send Ej 

next .state . ; 


vc E.L.YOP.NGA24 

send Ej 

next .state . ; 


vc E.L.YOP.NGA 24 

assert i vc; 
do -t€; 
if(Ov){ 

next .state 5; 
}else{ 

next .state . ; 

} 
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CTE 


FLCFT 


RFLCFT 


cv 


VC S_L_YOP_YGA 25 


VC S J..YOP.YGA 25 


VC S J..YOP.YGA25 


VC S J..YOP.YGA 25 


next _st at e X; 


send Et 


send Ej 


assert i vc; 




next .state . ; 


next .state . ; 


do -fc; 

if(1){ 
send cv; 

} 
if(Ov){ 

next .state 6; 
}else{ 

next .state . ; 

} 


VC E_L_YOP_YGA 26 


VC E.L.YOP.YGA 26 


VC E.L.YOP.YGA26 


VC EJ..YOP.YGA 26 


next .state X; 


send Ej 


send Ej 


assert i vc; 




next .state . ; 


next .state . ; 


do -fc; 
if(Ov){ 

next .state 7; 
}else { 

next .state . ; 

} 


V-VK. S_U_NOP_NGA 27 


vw; S.U.NOP.NGA 27 


vw; S.U.NOP.NGA27 


vw; s.uJsrop.NGA 27 


do 4€; 


send rt ; 


send rt ; 


assert i vc; 


next .state 28; 


next .state . ; 


next .state . ; 


do -fc; 

send rdaw, c; 

if(Ov){ 

next .state 1; 
}else { 

next .state 20; 

} 


V-VK. E_U_NOP_NGA 28 


vw; E.uJsropjSTGA 28 


vw; E.U.NOP.NGA28 


vw; E.uJsropjSTGA 28 


next .state X; 


s e nd r t ; 


send rt ; 


assert i vc; 




next .state . ; 


next .state . ; 


do -fc; 

send rdaw, c; 

if(Ov){ 

next .state 2; 
}else { 

next .state 21; 

} 


v-vK. s.LJsrop.NGA 29 


vw; S J..NOP.NGA 29 


vw; S J..NOP.NGA29 


vw; s J.Jsrop.NGA 29 


next .state X; 


send Ej 


send Ej 


assert i vc; 




next .state . ; 


next .state . ; 


do -fc; 

send rdaw, c; 

if(Ov){ 

next J tat e 3; 
}else { 

next J tat e 22; 

} 
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CTE 


FLCFT 


RFLCFT 


cv 


V-VK. S_L_YOP_NGA 30 

next _st at e X; 


VW; S_L_YOP_NGA 30 

send Et 

next .state . ; 


VW; S J..YOP.NGA30 

send Et 

next .state . ; 


VW; S J..YOP.NGA 30 

assert i vc; 

do -fc; 

s e nd r daw, c ; 

if(Ov){ 

next .state 4; 
} el se { 

next .state 23; 
} 


V-VK. E_L_YOP_NGA 31 

next _st at e X; 


vw; E_L_YOPJsrGA 31 

send Et 

next .state . ; 


vw; EJ..YOP.NGA31 

send Et 

next .state . ; 


VW; EJ..YOP.NGA 31 

assert i vc; 

do -fc; 

s e nd r daw, c ; 

if(Ov){ 

next .state 5; 
} el se { 

next .state 24; 
} 


V-VK. S_L_YOP_YGA 32 

next _st at e X; 


vw; S_L_YOP_YGA 32 

send Et 

next .state . ; 


vw; S J..YOP.YGA32 

send Et 

next .state . ; 


vw; s J..YOP.YGA 32 

assert i vc; 

do -fc; 

s e nd r daw, c ; 

if(1){ 
send cv; 

} 
if(Ov){ 

next .state 6; 
}else{ 

next .state 25; 

} 


vw; E_L_YOP_YGA 33 
next .state X; 


V-VK. E.L.YOP.YGA 33 

send Et 

next .state . ; 


V-VK. EJ..YOP.YGA33 

send Et 

next .state . ; 


YWZ EJ..YOP.YGA 33 

assert i vc; 

do -fc; 

s e nd r daw, c ; 

if(Ov){ 

next .state 7; 
}else{ 

next .state 26; 

} 
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RD 


UNCV 


RT 


W) 


I NVALI D 


I NVALI D 


I NVALI D 


I NVALI D 


next _st at e X; 


next .state X; 


assert T, 
send rflcf t; 
next .state . ; 


next .state ^ 


C S_U_NOP_NGA 1 


C S.U.NOP.NGA 1 


C S.U.NOP.NGA 1 


C S.U.NOP.NGA 1 


next .state . ; 


assert ic; 


send rt ; 


next .state X, 




do -fv; 


next .state . ; 






if(Oc){ 








assert T, 








send uncv; 








next .state 8; 








}else{ 








next .state 20; 








} 






C E_U_NOP_NGA 2 


C E.U.NOP.NGA 2 


C E.U.NOP.NGA 2 


C E.U.NOP.NGA 2 


next _st at e X; 


assert ic; 


send rt ; 


next .state ^ 




do -fv; 


next .state . ; 






assert OQ 








next .state 21; 






C S_L_NOP_NGA 3 


C S J..NOP.NGA 3 


C S.L.NOP.NGA 3 


C S J..NOP.NGA 3 


next .state . ; 


assert ic; 


send Ej 


send wd; 




do -fv; 


next .state . ; 


next .state 2; 




if(OcM){ 








send uncv; 








} 








if(Oc){ 








next .state 9; 








}else{ 








next .state 22; 








} 






C S_L_YOP_NGA 4 


C S J..YOP.NGA 4 


C S.L.YOP.NGA 4 


C S J..YOP.NGA 4 


next .state . ; 


assert ic; 


send Ej 


next .state X, 




do -fv; 


next .state . ; 






if(OcM){ 








send uncv; 








} 








if(Oc){ 








next .state 10; 








} el se { 








next .state 23; 








} 






C E_L_YOP_NGA 5 


C EJ..YOP.NGA 5 


C E.L.YOP.NGA 5 


C EJ..YOP.NGA 5 


next .state X; 


assert ic; 


send Et 


next .state 5; 




do -fv; 


next .state . ; 






if(Oc){ 








next .state 11; 








}else{ 








next .state 24; 








} 
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RD 


UNCV 


RT 


W) 


C S_L_YOP_YGA 6 


C S J..YOP.YGA 6 


C S.L.YOP.YGA 6 


C S J..YOP.YGA 6 


next _st at e . ; 


assert i c; 


send Et 


next .state ^ 




do -fv; 


next .state . ; 






if(OcM){ 








send uncv; 

} 
if(Oc){ 














next .state 12; 








}else{ 








next .state 25; 
} 






C E_L_YOP_YGA 7 


C EJ..YOP.YGA 7 


C E.L.YOP.YGA 7 


C EJ..YOP.YGA 7 


next .state X; 


assert i c; 


send Ej 


next .state X, 




do -fv; 


next .state . ; 






if(Oc){ 








next .state 13; 








} el se { 








next .state 26; 
} 






V S_U_NOP_NGA 8 


V S.U.NOPJSrGA 8 


V S.U.NOP.NGA 8 


V S.UJSrOPJSTGA 8 


send rdav, c; 


next .state X; 


assert T, 


next .state ^ 


next .state 1; 




send rflcf t ; 
next .state . ; 




V S J._NOP_NGA 9 


V S.L.NOPJSrGA 9 


V s J..NOPJsrGA 9 


V s J.Jsrop.NGA 9 


send rdav, c; 


next .state X; 


send Et 


next .state ^ 


next _st ate 3; 




next .state . ; 




V S J._YOP_NGA 10 


V S.L.YOPJSrGA 10 


V S J..YOPJSrGA 10 


V S J..YOP.NGA 10 


send rdavL, c; 


next .state X; 


send Et 


next .state ^ 


if(Oc){ 




next .state . ; 




next .state . ; 








}else{ 








next .state 23; 
} 








V EJ._YOP_NGA 11 


V E.L.YOPJSrGA 11 


V EJ..YOP.NGA 11 


V EJ..YOPJSrGA 11 


next .state X; 


next .state X; 


send Et 

next .state . ; 


next .state ^ 


V S J..YOP.YGA 12 


V S.L.YOP.YGA 12 


V S J..YOP.YGA 12 


V S J..YOP.YGA 12 


send rdavL, c; 


next .state X; 


send Et 


next .state 5; 


if(Oc){ 




next .state . ; 




next .state . ; 








} el se { 








next .state 25; 
} 








V EJ..YOP.YGA 13 


V E.L.YOP.YGA 13 


V EJ..YOP.YGA 13 


V EJ..YOP.YGA 13 


next .state X; 


next .state X; 


send Et 

next .state . ; 


next .state 5; 
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RD 


XJNCV 


RT 


W) 


VWS_U_NOP_NGA 14 


VWS_U_NOP_NGA 14 


VWS_U_NOP_NGA 14 


vws.ujsropjSTGA 14 


s e nd r davw, c ; 


next .state X; 


assert T, 


next .state ^ 


next _st at e 1; 




send rflcf t ; 
next _st at e . ; 




VWS_L_NOP_NGA 15 


VWS_L_NOP_NGA 15 


VWS_L_NOP_NGA 15 


vws J.Jsrop.NGA 15 


send rdavw, c; 


next _st at e X; 


send Ej 


next .state X, 


next _st at e 3; 




next .state . ; 




VWS_L_YOP_NGA 16 


VWS_L_YOP_NGA 16 


vws J._YOPJsrGA 16 


VWS J..YOP.NGA 16 


send rdav\dj, c; 


next .state X; 


send Et 


next .state ^ 


next _st at e 23; 




next .state . ; 




VWE_L_YOP_NGA 17 


VWE_L_YOP_NGA 17 


VWEJ._YOP_NGA 17 


VWEJ..YOPJSrGA 17 


next .state X; 


next _st at e X; 


send Ej 

next _st at e . ; 


next .state X, 


VWS J._YOP_YGA 18 


VWS_L_YOP_YGA 18 


VWS J._YOP_YGA 18 


VWS J..YOP.YGA 18 


send rdav\dj, c; 


next .state X; 


send Et 


next .state ^ 


next _st at e 25; 




next .state . ; 




VWE_L_YOP_YGA 19 


VWE_L_YOP_YGA 19 


VWEJ._YOP_YGA 19 


VWEJ..YOP.YGA 19 


next .state X; 


next _st at e X; 


send Ej 

next _st at e . ; 


next .state X, 


VC S_U_NOP_NGA 20 


VC s_uJsrop_NGA 20 


VC s_uJsrop_NGA 20 


VC s.uJsropjSTGA 20 


next _st at e . ; 


assert i c; 


send rt ; 


next .state ^ 




do -tv; 


next .state . ; 






if(Oc){ 








assert T, 








send uncv; 








ne xt _s t at e 8 ; 








} el se { 








ne xt _s t at e . ; 
} 






VC E_U_NOP_NGA 21 


VC E_U_NOP_NGA 21 


VC E-UJSrOPJSTGA 21 


VC E.UJSrOP.NGA 21 


next _st at e X; 


assert i c; 


send rt ; 


next .state ^ 




do -tv; 


next .state . ; 






assert OQ 








next _st at e . ; 






VC S_L_NOP_NGA 22 


VC S_L_NOP_NGA 22 


VC s j.Jsrop_NGA 22 


VC s.LJsropjsrGA 22 


next .state . ; 


as s e r t i c ; 


send Ej 


next .state X, 




do -tv; 


next .state . ; 






if(OcM){ 








send uncv; 

} 
if(Oc){ 














ne xt _s t at e 9 ; 








} el se { 








ne xt _s t at e . ; 
} 
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RD 


UNCV 


RT 


W) 


VC S_L_YOP_NGA 23 


VC S.L.YOP.NGA 23 


VC S J..YOP.NGA 23 


VC S J..YOP.NGA 23 


next _st at e . ; 


assert i c; 


send Et 


next .state ^ 




do -K; 


next .state . ; 






if(OcM){ 








send uncv; 








} 








if(Oc){ 








next .state 10; 








}else{ 








next .state . ; 








} 






VC E_L_YOP_NGA 24 


VC E.L.YOP.NGA 24 


VC E.L.YOP.NGA 24 


VC EJ..YOP.NGA 24 


next .state X; 


assert i c; 


send Ej 


next .state X, 




do -fv; 


next .state . ; 






if(Oc){ 








next .state 11; 








} el se { 








next .state . ; 








} 






VC S_L_YOP_YGA 25 


VC S.L.YOP.YGA 25 


VC S J..YOP.YGA 25 


VC S J..YOP.YGA 25 


if(lcMc){ 


assert i c; 


send Et 


next .state ^ 


send rdavL, c; 


do -fv; 


next .state . ; 




next .state 6; 


if(OcM){ 






}else { 


send uncv; 






next .state . ; 


} 






} 


if(Oc){ 

next .state 12; 
}else{ 

next .state . ; 

} 






VC E_L_YOP_YGA 26 


VC E.L.YOP.YGA 26 


VC E.L.YOP.YGA 26 


VC EJ..YOP.YGA 26 


next .state X; 


assert i c; 


send Ej 


next .state X, 




do -fv; 


next .state . ; 






if(Oc){ 








next .state 13; 








}else{ 








next .state . ; 








} 






vw; S.U.NOP.NGA 27 


YWZ S.U.NOP.NGA 27 


V-VK. S.U.NOP.NGA 27 


V-VK. S.U.NOP.NGA 27 


next .state . ; 


assert i c; 


send rt ; 


next .state 5; 




do -fv; 


next .state . ; 






if(Oc){ 








assert 15 








send uncv; 








next .state 14; 








}else{ 








next .state . ; 








} 
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RD 


UNCV 


RT 


W) 


V-VK. E_U_NOP_NGA 28 

next _st at e X, 


VW; E_U_NOP_NGA 28 
assert i c; 
do -K; 
assert OQ 
next _st at e . ; 


VW; E_U_NOP_NGA 28 
send rt ; 
next .state . ; 


VW; E.U.NOP.NGA 28 
next .state ^ 


VVIC S_L_NOP_NGA 29 
next _st at e . ; 


v-vK. s_LJsrop_NGA 29 
assert i c; 
do -K; 
if(OcM){ 
send uncv; 

} 
if(Oc){ 

next _st ate 15; 
} el se { 

next _st at e . ; 
} 


V-VK. S J._NOP_NGA 29 

send Et 

next .state . ; 


V-VK. S J..NOP.NGA 29 
next .state ^ 


V-VK. S_L_YOP_NGA 30 
next _st at e . ; 


VW; S_L_YOP_NGA 30 

assert i c; 
do -K; 
if(OcM){ 
send uncv; 

} 
if(Oc){ 

next _st ate 16; 
} el se { 

next _st at e . ; 
} 


vw; S J..YOP.NGA 30 

send Et 

next .state . ; 


vw; S J..YOP.NGA 30 
next .state ^ 


V-VK. E_L_YOP_NGA 31 

next _st at e X, 


VW; E_L_YOP_NGA 31 

assert i c; 
do -fv; 
if(Oc){ 

next _st ate 17; 
} el se { 

next _st at e . ; 
} 


vw; E.L.YOP.NGA 31 
send Et 

next .state . ; 


vw; E J..YOP.NGA 31 
next .state ^ 


V-VK. S_L_YOP_YGA 32 

if(lcMc){ 
send rdav\dj, c; 
next .state 6; 

}else{ 
next .state . ; 

} 


VW; S_L_YOP_YGA 32 
assert i c; 
do -K; 
if(OcM){ 
send uncv; 

} 
if(Oc){ 

next .state 18; 
} el se { 

next .state . ; 

} 


vw; S J..YOP.YGA 32 

send Et 

next .state . ; 


vw; S J..YOP.YGA 32 
next .state ^ 



154 



APPENDI X C. TABLE OF PROTOCOL BEHAVI OR 



RD 


UNCV 


RT 


W) 


V-VK. E_L_YOP_YGA 33 


VW; E_L_YOP_YGA 33 


VW; E_L_YOP_YGA 33 


VW; EJ._YOP_YGA 33 


next _st ate X; 


assert i c; 


send Et 


next .state ^ 




do -K; 


next .state . ; 






if(Oc){ 








next _st ate 19; 








} el se { 








next _st at e . ; 








} 
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